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MOLECULAR SEQUENCE OF SWINE RETROVIRUS 
AND METHODS OF USE 

5 This application is a continuation-in-part of U.S. S.N. 08/572,645, filed December 

14, 1995, which is hereby incorporated by reference. 

Field of the Invention 
The invention relates to porcine retroviral sequences, peptides encoded by porcine 
retroviral sequences, and methods of using the porcine retroviral nucleic acids and peptides. 

10 Background of the Invention 

Advances in solid organ transplantation and a chronic shortage of suitable organ 
donors have made xenotransplantation an attractive alternative to the use of human 
allografts. However, the potential for introduction of a new group of infectious diseases 
from donor animals into the human population is a concern with the use of these methods. 

15 The term applied to the natural acquisition by humans of infectious agents carried 

by other species is zoonosis. The transplantation of infection from nonhuman species into 
humans is best termed "direct zoonosis" or "xenosis." 

Nonhuman primates and swine have been considered the main potential sources of 
organs for xenotransplantation (Niekrasz et al. (1992) Transplant Proc 24:625; Starzi et al. 

20 (1993) Lancet 341:65; Murphy et al. (1970) Trans Proc 4:546; Brede and Murphy (1972) 
Primates Med 7:18; Cooper et al. In Xenotransplantation: The Transplantation of Organs 
and Tissues between Species , eds. Cooper et al. (1991) p. 457; RY Calne (1970) Transplant 
Proc 2:550; H. Auchincloss, Jr. (1988) Transplantation 46:1; and Chiche et al. (1993) 
Transplantation 6:1418). The infectious disease issues for^primates and swine are similar 

25 to those of human donors. The prevention of infection depends on the ability to predict, to 
recognize, and to prevent common infections in the immunocompromised transplantation 
recipient (Rubin et al. (1993) Antimicrob Agents Chemother 37:619). Because of the 
potential carriage by nonhuman primates of pathogens easily adopted to humans, ethical 
concerns, and the cost of maintaining large colonies of primates, other species have 

30 received consideration as organ donors (Brede and Murphy (1972) Primates Med 7:18; Van 
Der Riet et al. (1987) Transplant Proc 19:4069; tCatler In Xenotransplantation: The 
Transplantation of Organs and Tissues between Species , eds. Cooper et al. (1991 ) p. 457; 
Metzgeret al. (1981) J Immunol 127:769; McClure et al. (1987) Nature 330:487; Letvin et 
al. (1987) J Infect Dis 156:406; Castro et al. (1991) Virology 184:219; Benveniste and 

35 Todaro (1973) Proc Natl Acad Sci USA 70:3316; and Teich, in RNA Tumor viruses , eds. 
Weiss et. al. (1985) p. 25) The economic importance of swine and experience in studies of 
transplantation in the miniature swine model have allowed some of the potential pathogens 
associated with these animals to be defined (Niekrasz et al. (1992) Transplant Proc 24:625; 
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Cooper et al. In Xenotransplantation: The Transplantation of Organs and Tissues between 
Species, eds. Cooper et al. ( 1 99 1 ) p. 457; and Leman et al. ( 1 992) Diseases of S wine 7th 
ed. Ames, Iowa:Iowa State University). Miniature swine have received consideration as 
organ donors because of a number of features of the species. The structure and function of 
the main pig organs are comparable to those of man. Swine attain body weights and orean 
sizes adequate to the provision of organs for human use. Lastly, veterinarians and 
commercial breeders have developed approaches to creation of specific-pathogen-frce 
(SPF) swine with the ability to eliminate known pathogens from breeding colonies 
(Alexander et al. (1980) Proc 6th Int Congr Pig Vet Soc, Copenhagen; Betts (1961) Vet Rec 
73:1349; Betts et al. (1960) Vet Rec 72:461; Caldwell et al. (1959) J Am Vet Med Assoc 
135:504; and Yong (1964) Adv VetSci9:6\). 

Concern exists over the transfer of porcine retroviruses by xenotransplantation 
(Smith (1993) N Engl J Med 328:141). Many of the unique properties of the retroviruses 
are due to the synthesis of a complementary DNA copy from the RNA template (by reverse 
15 transcriptase), and integration of this DNA into the host genome. The integrated retroviral 
copy (which is referred to as an endogenous copy or "provirus") can be transmitted via the 
germ line. 

Summary of the Invention 

In general, the invention features a purified swine or miniature swine retroviral 

20 nucleic acid, e.g., a Tsukuba nucleic acid, a purified miniature swine retroviral nucleic acid 
sequence of SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID 
NO:3 or its complement, and methods of their use in detecting the presence of porcine, e.t;., 
miniature swine, retroviral sequences. 

In another aspect, the invention features a purified nucleic acid, e.g., a probe or 

25 primer, which can specifically hybridize with a purified swine or miniature swine retroviral 
genome, e.g., a Tsukuba genome, the sequence of SEQ ID NO:l or its complement, SEQ 
ID NO:2 or its complement, or SEQ ID NO:3 or its complement. 

In preferred embodiments the nucleic acid is other than the entire retroviral genome 
of SEQ ID NOT or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or 

30 its complement, e.g., it is at least 1 nucleotide longer, or at least 1 nucleotide shorter, or 
differs in sequence at at least one position, e.g., the nucleic acid is a fragment of the 
sequence of SEQ ID NO: 1 or its complement SEQ ID NO:2 or its complement, or SEQ ID 
NO:3 or its complement, or it includes sequence additional to that of SEQ ID NO: 1 , or its 
complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement. 

3^ In preferred embodiments, the nucleic acid has at least 60%, 70%, 72%. more 

preferably at least 85%, more preferably at least 90%, more preferably at least 95%, most 
preferably at least 98%, 99% or 100% sequence identity or homology with a sequence from 
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SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its 
complement. 

In other embodiments: the sequence of the nucleic acid differs from the 
corresponding sequence of SEQ ID NO: 1 or its complement. SEQ ID NO:2 or its 
5 complement, or SEQ ID NO:3 or its complement, by 1, 2, 3, 4, or 5 base pairs; the 

sequence of the nucleic acid differs from the corresponding sequence of SEQ ID NO: 1 or 
its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement, by 
at least 1, 2, 3, 4, or 5 base pairs but less than 6, 7 % 8, 9, or 10 base pairs. 

In other preferred embodiments: the nucleic acid is at least 10, more preferably at 

10 least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 1000. 2000. 
4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 1 5, more preferably 
less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000, 4000, 6000, or 8060 
nucleotides in length. 

In yet other preferred embodiments: the nucleic acid can specifically hybridize with 

15 a translatable region of a miniature swine retroviral genome, e.g., the retroviral genome of 
SEQ ID NO: 1, or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its 
complement, e.g., a region from the gag, pol, or env gene; the probe or primer can 
specifically hybridize with an untranslated region of a miniature swine retroviral genome, 
e.g., the retroviral genome of SEQ ID NO: 1, or its complement SEQ ID NO:2 or its 

20 complement, or SEQ ID NO:3 or its complement; the probe or primer can specifically 
hybridize with a non-conserved region of a miniature swine retroviral genome, e.g., the 
retroviral genome of SEQ ID NO: 1 , or its complement, SEQ ID NO:2 or its complement, 
or SEQ ID NO:3 or its complement; the probe or primer can specifically hybridize with the 
highly conserved regions of a miniature swine retroviral genome, e.g., the retroviral 

25 genome of SEQ ID NO: I, or its complement, SEQ ID NO:2 or its complement, or SEQ ID 
NO:3 or its complement. 

In preferred embodiments, the primer is selected from the group consisting of SEQ 
ID NOs:4-74. 

In preferred embodiments, hybridization of the probe to retroviral sequences can be 
30 detected by standard methods, e.g., by radiolabeled probes or by probes bearing 

nonradioactive markers such as enzymes or antibody binding sites. For example, a probe 
can be conjugated with an enzyme such as horseradish peroxidase, where the enzymatic 
activity of the conjugated enzyme is used as a signal for hybridization. Alternatively, the 
probe can be coupled to an epitope recognized by an antibody, e.g., antibody conjugated 
35 to an enzyme or another marker. 

In another aspect, the invention features a reaction mixture which includes a target 
nucleic acid, e.g., a human, swine, or a miniature swine nucleic acid, and a purified second 
nucleic acid, e.g., a probe or primer, as, e.g., is described herein, which specifically 
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hybridizes with the sequence of SEQ ID NO: I or its complement. SEQ ID NO:2 or its 
complement, or SEQ ID NO:3 or its complement, a swine or a miniature swine retroviral 
nucleic acid, e.g., a Tsukuba nucleic acid. 

In preferred embodiments, the target nucleic acid: includes RNA; or includes DNA. 
5 In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 

from a miniature swine; RNA or cDNA, e.g., cDNA made from an RNA template, isolated 
from a miniature swine; DNA, RNA or cDNA, e.g., cDNA made from an RNA template, 
isolated from a miniature swine organ, e.g., a kidney; RNA, DNA or cDNA, e.g.. cDNA 
made from an RNA template, isolated from a miniature swine potential donor organ; RNA, 

10 DNA or cDNA, e.g., cDNA made from an RNA template, isolated from a miniature swine 
organ which has been transplanted into a organ recipient, e.g., a xenogeneic recipient, e.g., 
a primate, e.g., a human. 

In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 
from a swine; RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 

15 swine; DNA, RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 
swine organ, e.g., a kidney; RNA, DNA or cDNA, e.g., cDNA made from an RNA 
template, isolated from a swine potential donor organ; RNA, DNA or cDN A, e.g., cDNA 
made from an RNA template, isolated from a swine organ which has been transplanted into 
a organ recipient, e.g., a xenogeneic recipient, e.g., a primate, e.g.. a human. 

20 In a preferred embodiment: the second nucleic acid is a porcine retroviral sequence, 

probe or primer, e.g., as described herein, e.g., a Tsukuba- 1 retroviral sequence; the second 
nucleic acid is a sequence of SEQ ID NO:l or its complement, SEQ ID NO:2 or its 
complement, or SEQ ID NO:3 or its complement, or a fragment of the sequence or 
complement at least 10, 20, or 30, basepairs in length. — - 

25 In preferred embodiments, the second nucleic acid has at least 60%, 70%, 72%, 

more preferably at least 85%, more preferably at least 90%, more preferably at least 95%, 
most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
from SEQ ID NO: 1 or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 
or its complement. 

30 In other preferred embodiments: the second nucleic acid is at least 10. more 

preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 
1000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 15. 
more preferably less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000. 4000, 
6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 

35 genome. 

In preferred embodiments the second nucleic acid is: a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence which encodes a gag protein; a 
nucleic acid of al least 10 consecutive nucleotides of sense or antisense sequence from 



WO 97/21836 



PCT7US96/19680 



nucleotides 2452-4839 (e.g, from nucleotides 3 1 1 2-4683) of SEQ ID NO:l, nucleotides 
598-2169 (e.g, from nucleotides 598-2169) of SEQ ID NO:2 ? or nucleotides 585-2156 (e.g, 
from nucleotides 585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; 
a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
5 encodes a pol protein; a nucleic acid of at least 1 0 consecutive nucleotides of sense or 

antisense sequence from nucleotides 4871-8060 of SEQ ID NO:L nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring 
mutants thereof; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
sequence which encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides 

10 of sense or antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of 
SEQ ID NO: 1. nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, 
or nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof 

In another aspect, the invention features a method for screening a cell or a tissue, 
e.g.. a cellular or tissue transplant, e.g., a xenograft, for the presence or expression of a 

15 swine or a miniature swine retrovirus or retroviral sequence, e.g., an endogenous miniature 
swine retrovirus. The method includes: 

contacting a target nucleic acid from the tissue with a second sequence chosen from 
the group of: a sequence which can specifically hybridize to a porcine retroviral sequence; a 
sequence which can specifically hybridize to the sequence of SEQ ID NO:l or its 

20 complement; a sequence which can specifically hybridize to the sequence of SEQ ID NO:2 
or its complement; a sequence which can specifically hybridize to the sequence of SEQ ID 
NO:3 or its complement; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence which encodes a gag protein; a nucleic acid of at least 10 consecutive 
nucleotides of sense or antisense sequence from nucleotides 2452-4839 (e.g, from 

25 nucleotides 3 1 12-4683) of SEQ IDNO:K nucleotides 598-2169 (e.g, from nucleotides 

598-2169) of SEQ ID NO:2 ? or nucleotides 585-2156 (e.g, from nucleotides 585-2156) of 
SEQ ID NO:3, or naturally occurring mutants thereof; a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence which encodes a pol protein; a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 

30 nucleotides 4871-8060 of SEQ ID NO: 1 , nucleotides 2320-4737 of SEQ ID NO:2, or 

nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring mutants thereof; a nucleic 
acid of at least 10 consecutive nucleotides of sense or antisense sequence which encodes a 
env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID NO:l, 

35 nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or nucleotides 
5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; a swine or miniature 
swine retroviral nucleic acid; or a Tsukuba nucleic acid under conditions in which 
. hybridization can occur, hybridization being indicative of the presence or expression of an 
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endogenous miniature swine retrovirus or retroviral sequence in the tissue or an 

endogenous swine retrovirus in the tissue. 

In preferred embodiments, the method further includes amplifying the target nucleic 

acid with primers which specifically hybridize to the sequence of SEQ ID NO: 1 or its 
5 complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement. 

In preferred embodiments, the tissue or cellular transplant is selected from the group 

consisting of: heart, lung, liver, bone marrow, kidney, brain cells, neural tissue, pancreas or 

pancreatic cells, thymus, or intestinal tissue. 

In other preferred embodiments, the target nucleic acid is: DNA; RNA; or cDNA. 
0 In other preferred embodiments, the target nucleic acid is taken from: a tissue 

sample, or a blood sample, e.g., a tissue biopsy sample, e.g., a tissue sample suitable for in 

situ hybridization or immunohistochemistry. 

In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 

from a miniature swine; RNA or cDNA, e.g., cDNA made from an RNA template, isolated 
5 from a miniature swine; DNA, RNA or cDNA, e.g., cDNA made from an RNA template, 

isolated from a miniature swine organ, e.g., a kidney; RNA, DNA or cDNA. e.g.. cDNA 

made from an RNA template, isolated from a miniature swine potential donor organ; RNA. 

DNA or cDNA, e.g., cDNA made from an RNA template, isolated from a miniature swine 

organ which has been transplanted into a organ recipient, e.g., a xenogeneic recipient, e.g., 
0 a primate, e.g., a human. 

In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 

from a swine; RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 

swine; DNA, RNA or cDN A, e.g., cDNA made from an RNA template, isolated from a 

swine organ, e.g., a kidney; RNA. DNA or cDNA, e.g., cDNA made from an RNA 
5 template, isolated from a swine potential donor organ; RNA, DNA or cDNA, e.g., cDNA 

made from an RNA template, isolated from a swine organ which has been transplanted into 

a organ recipient, e.g., a recipient swine or a xenogeneic recipient, e.g., a primate, e.g., a 

human. 

In a preferred embodiment the target nucleic acid is RNA, or a nucleic acid 
0 amplified from RNA in the tissue, and hybridization is correlated with expression of an 
endogenous miniature swine retrovirus or retroviral sequence or an endogenous swine 
retrovirus. 

In a preferred embodiment the target nucleic acid is DNA, or a nucleic acid 
amplified from DNA in the tissue, and hybridization is correlated with the presence of an 
5 endogenous miniature swine retrovirus or an endogenous swine retrovirus. 

In preferred embodiments, the second nucleic acid has at least 60%, 70%, 72%, 
more preferably at least 85%, more preferably at least 90%, more preferably at least 95%. 
most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
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from SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 
or its complement. 

In other preferred embodiments: the second nucleic acid is at least 10, more 
preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 
5 1000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 15, 
more preferably less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000, 4000. 
6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 
genome. 

In another aspect, the invention features a method of screening a porcine derived 
10 cell or tissue for the presence of an activatable porcine retrovirus, e.g., an activatable 
porcine provirus. The method includes: 

stimulating a porcine derived cell or tissue with a treatment which can activate a 
retrovirus; 

contacting a target nucleic acid from the porcine derived cell or tissue with a second 

15 sequence chosen from the group of: a sequence which can specifically hybridize to a 

porcine retroviral sequence; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:l or its complement; a sequence which can specifically hybridize to the 
sequence of SEQ ID NO:2 or its complement; a sequence which can specifically hybridize 
to the sequence of SEQ ID NO:3 or its complement; a nucleic acid of at least 10 

20 consecutive nucleotides of sense or antisense sequence which encodes a gag protein; a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
nucleotides 2452-4839 (e.g, from nucleotides 31 12-4683) of SEQ ID NO:L nucleotides 
598-2169 (e.g, from nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, 
from nucleotides 585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; a 

25 nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l, nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring 
mutants thereof; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 

30 sequence which encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides 
of sense or antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of 
SEQ ID NO:l, nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, 
or nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof: a swine 
or miniature swine retroviral nucleic acid; or a Tsukuba nucleic acid hybridization being 

35 indicative of the presence of an activatable porcine provirus in the porcine derived cell or 
tissue. 
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In preferred embodiments the treatment is: contact with a drug, e.g., a steroid or a 
cytotoxic agent, infection or contact with a virus, the induction of stress, e.g., nutritional 
stress or immunologic stress, e.g., contact with a T-cell, e.g., a reactive T-cell. 

In preferred embodiments, the method further includes amplifying the target nucleic 
5 acid with primers which specifically hybridize to the sequence of SEQ ID NO: 1 or its 
complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement. 

In other preferred embodiments, the target nucleic acid is taken from: a tissue 
sample, or a blood sample, e.g., a tissue biopsy sample. e.g. ? a tissue sample suitable for in 
situ hybridization or immunohistochemistry. 

10 In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 

from a miniature swine; RNA or cDNA, e.g., cDNA made from an RNA template, isolated 
from a miniature swine; DNA, RNA or cDNA, e.g., cDNA made from an RNA template, 
isolated from a miniature swine organ, e.g., a kidney; RNA, DNA or cDNA, e.g., cDNA 
made from an RNA template, isolated from a miniature swine potential donor organ; RNA, 

15 DNA or cDNA, e.g., cDNA made from an RNA template, isolated from a miniature swine 
organ which has been transplanted into a organ recipient, e.g., a xenogeneic recipient, e.g., 
a primate, e.g., a human. 

In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 
from a swine; RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 

20 swine; DNA, RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 
swine organ, e.g., a kidney; RNA, DNA or cDNA, e.g., cDNA made from an RNA 
template, isolated from a swine potential donor organ; RNA, DNA or cDNA, e.g., cDNA 
made from an RNA template, isolated from a swine organ which has been transplanted into 
a organ recipient, e.g., a recipient swine or a xenogeneic recipient, e.g., a primate, e.g., a 

25 human. 

In preferred embodiments, the second nucleic acid has at least 60%, 70% ? 72%, 
more preferably at least 85%, more preferably at least 90%, more preferably at least 95%, 
most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
from SEQ ID NO: I or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 

30 or its complement. 

In other preferred embodiments: the second nucleic acid is at least 10, more 
preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 
1000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 15, 
more preferably less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000, 4000, 

35 6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 
genome. 
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In another aspect, the invention features a method for screening a miniature swine 
genome or a swine genome for the presence of a porcine retrovirus or retroviral sequence, 
e.g., an endogenous porcine retrovirus. The method includes: 

contacting the miniature swine (or swine) genomic DNA with a second sequence 
5 chosen from the group of: a sequence which can specifically hybridize to a porcine 

retroviral sequence; a sequence which can specifically hybridize to the sequence of SEQ ID 
NO:l or its complement; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:2 or its complement; a sequence which can specifically hybridize to the 
sequence of SEQ ID NO:3 or its complement; a nucleic acid of at least 10 consecutive 

10 nucleotides of sense or aritisense sequence which encodes a gag protein; a nucleic acid of 
at least 1 0 consecutive nucleotides of sense or antisense sequence from nucleotides 2452- 
4839 (e.g, from nucleotides 3 1 1 2-4683) of SEQ ID NO: 1, nucleotides 598-2169 (e.g. from 
nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, from nucleotides 
585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; a nucleic acid of at 

1 5 least 10 consecutive nucleotides of sense or antisense sequence which encodes a poi 

protein; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence 
from nucleotides 4871-8060 of SEQ ID NO: 1, nucleotides 2320-4737 of SEQ ID NO:2, or 
nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring mutants thereof; a nucleic 
acid of at least 10 consecutive nucleotides of sense or antisense sequence which encodes a 

20 env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID NO:l, 
nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or nucleotides 
5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; a swine or miniature 
swine retroviral nucleic acid; or a Tsukuba nucleic acid under conditions in which the 

25 sequences can hybridize, hybridization being indicative of the presence of the endogenous 
porcine retrovirus or retroviral sequence in the miniature swine (or swine) genome. 

In preferred embodiments, the method further includes amplifying all or a portion of 
the miniature swine (or swine) genome with primers which specifically hybridize to the 
sequence of SEQ ID NO: I or its complement, SEQ ID NO:2 or its complement, or SEQ ID 

30 NO:3 or its complement. 

In a preferred embodiment: the second nucleic acid is a porcine retroviral sequence, 
probe or primer, e.g., as described herein, e.g., a Tsukuba- 1 retroviral sequence; the second 
nucleic acid is a sequence of SEQ ID NO:l or its complement, SEQ ID NO:2 or its 
complement, or SEQ ID NO:3 or its complement, or a fragment of the sequence or 

35 complement at least 10, 20, or 30, basepairs in length. 

In preferred embodiments, the second nucleic acid has at least 60%, 70%, 72%, 
more preferably at least 85%, more preferably at least 90%, more preferably at least 95%, 
most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
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from SEQ ID NO: 1 or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 
or its complement. 

In other preferred embodiments: the second nucleic acid is at least 1 0, more 
preferably at least 15, more preferably at least 20 ? most preferably at least 25, 30, 50, 100, 
1000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 1 5, 
more preferably less than 20, most preferably less than 25, 30, 50, 100. 1000, 2000. 4000, 
6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 
genome. 

In another aspect, the invention features a method for screening a genetically 
modified miniature swine or a genetically modified swine for the presence or expression of 
a miniature swine or swine retrovirus or retroviral sequence, e.g., an endogenous miniature 
swine retrovirus. The method includes: 

contacting a target nucleic acid from the genetically modified miniature swine or 
swine with a second sequence chosen from the group of: a sequence which can specifically 
hybridize to a porcine retroviral sequence; a sequence which can specifically hybridize to 
the sequence of SEQ ID NO:l or its complement; a sequence which can specifically 
hybridize to the sequence of SEQ ID NO:2 or its complement; a sequence which can 
specifically hybridize to the sequence of SEQ ID NO:3 or its complement; a nucleic acid 
of at least 10 consecutive nucleotides of sense or antisense sequence which encodes a gae 
protein; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence 
from nucleotides 2452-4839 (e.g, from nucleotides 31 12-4683) of SEQ ID NO:l, 
nucleotides 598-2169 (e.g, from nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 
585-2156 (e.g, from nucleotides 585-2156) of SEQ ID NO:3, or naturally occurring 
mutants thereof; _ . 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l, nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring 
mutants thereof; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
sequence which encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides 
of sense or antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of 
SEQ ID NO:l, nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, 
or nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; a swine 
or miniature swine retroviral nucleic acid; or a Tsukuba nucleic acid under conditions in 
which hybridization can occur, hybridization being indicative of the presence or expression 
of an endogenous miniature swine retrovirus or retroviral sequence or swine retrovirus or 
retroviral sequence in the genetically modified miniature swine or swine. 
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In preferred embodiments, the method further includes amplifying the target nucleic 
acid with primers which specifically hybridize to the sequence of SEQ ID NO:l or its 
complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement. 

In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 
from a miniature swine; RNA or cDNA. e.g., cDNA made from an RNA template, isolated 
from a miniature swine; DNA, RNA or cDNA, e.g., cDNA made from an RNA template, 
isolated from a miniature swine organ, e.g., a kidney; RNA, DNA or cDNA, e.c, cDNA 
made from an RNA template, isolated from a miniature swine potential donor organ; RNA, 
DNA or cDNA, e.g., cDNA made from an RNA template, isolated from a miniature swine 
organ which has been transplanted into a organ recipient, e.g., a xenogeneic recipient, e.g., 
a primate, e.g., a human. 

In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 
from a swine; RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 
swine; DNA, RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 
1 5 swine organ, e.g., a kidney; RNA, DNA or cDNA, e.g., cDNA made from an RNA 

template, isolated from a swine potential donor organ; RNA, DNA or cDNA, e.a., cDNA 
made from an RNA template, isolated from a swine organ which has been transplanted into 
a organ recipient, e.g.. a recipient swine or a xenogeneic recipient, e.g., a primate, e.g., a 
human. 

20 In preferred embodiments, the second nucleic acid has al least 60%, 70%, 72%, 

more preferably at least 85%, more preferably at least 90%. more preferably at least 95%, 
most preferably at least 98% T 99% or 100% sequence identity or homology with a sequence 
from SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 
or its complement, _ 

25 In other preferred embodiments: the second nucleic acid is at least 1 0. more 

preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 
1000, 2000, 4000, 6000. or 8060 nucleotides in length; the nucleic acid is less than 15, 
more preferably less than 20, most preferably less than 25, 30. 50, 1 00, 1000, 2000. 4000, 
6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 

30 genome. 

in another aspect, the invention features a method of assessing the potential risk 
associated with the transplantation of a graft from a donor miniature swine or swine into a 
recipient animal, e.g., a miniature swine or swine, a non-human primate, or a human. The 
method includes: 

35 contacting a target nucleic acid from the donor, recipient or the graft, with a second 

sequence chosen from the group of: a nucleic acid sequence which specifically hybridizes a 
sequence which can specifically hybridize to a porcine retroviral sequence; a sequence 
which can specifically hybridize to the sequence of SEQ ID NO: I or us complement; a 
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sequence which can specifically hybridize to the sequence of SEQ ID NO:2 or its 
complement; a sequence which can specifically hybridize to the sequence of SEQ ID NO: 
or its complement; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence which encodes a gag protein; a nucleic acid of at least 10 consecutive 
nucleotides of sense or antisense sequence from nucleotides 2452-4839 (e.g, from 
nucleotides 31 12-4683) of SEQ ID NO: 1, nucleotides 598-2169 (e.g. from°nucleotides 
598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, from nucleotides 585-21 56) of 
SEQ ID NO:3, or naturally occurring mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 1 0 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l. nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring 
mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a env protein; a nucleic acid of at least 1 0 consecutive nucleotides of sense or 
antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID 
NO:l, nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2. or 
nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; a'swinc or 
miniature swine retroviral nucleic acid; or a Tsukuba nucleic acid under conditions in 
which the sequences can hybridize, hybridization being indicative of a risk associated with 
the transplantation. 

In a preferred embodiment: the second nucleic acid is a Tsukuba- 1 retroviral 
sequence, probe or primer, e.g., as described herein: the second nucleic acid is a porcine 
retroviral sequence, probe or primer, e.g.. as described herein; the second nucleic acid is the 
sequence of SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID 
NO:3 or its complement, or a fragment of the sequence or complement at least 1 0, 20, or 
30, basepairs in length. 

In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 
from a miniature swine; RNA or cDNA, e.g., cDNA made from an RNA template, isolated 
from a miniature swine; DNA, RNA or cDNA. e.g.. cDNA made from an RNA template, 
isolated from a miniature swine organ, e.g.. a kidney; RNA, DNA or cDNA. e.g., cDNA 
made from an RNA template, isolated from a miniature swine potential donor organ; RNA, 
DNA or cDNA, e.g., cDNA made from an RNA template, isolated from a miniature swine 
organ which has been transplanted into a organ recipient, e.g.,a xenogeneic recipience. c.. a 
primate, e.g.. a human. 

In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 
from a swine: RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 
swine; DNA. RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 
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swine organ, e.g., a kidney; RNA, DNA or cDNA, e.g., cDNA made from an RNA 
template, isolated from a swine potential donor organ; RNA, DNA or cDNA, e.g., cDNA 
made from an RNA template, isolated from a swine organ which has been transplanted into 
a organ recipient, e.g., a recipient swine or a xenogeneic recipient, e.g.. a primate, e.g.. a 
5 human. 

In preferred embodiments, the second nucleic acid has at least 60%, 70%. 72%. 
more preferably at least 85%, more preferably at least 90%, more preferably at least 95%, 
most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
from SEQ ID NO: 1 or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 

1 0 or its complement. 

In other preferred embodiments: the second nucleic acid is at least 10, more 
preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 
1000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 15, 
more preferably less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000, 4000, 

15 6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 
genome. 

In another aspect, the invention features a method of determining if an endogenous 
miniature swine or swine retrovirus or retroviral sequence genome includes a mutation 
which modulates its expression, e.g., results in misexpression. The method includes: 
20 determining the structure of the endogenous retroviral genome, and 

comparing the structure of the endogenous retroviral genome with the retroviral 
sequence of SEQ ID NO: 1 or its complement, SEQ ID NO:2 or its complement, or SEQ ID 
NO:3 or its complement, a difference being predictive of a mutation. 

In preferred embodiments the method includes sequencing the endogenous genome 
25 and comparing it with a sequence from SEQ ID NO:l or its complement, SEQ ID NO:2 or 
its complement, or SEQ ID NO:3 or its complement. 

In preferred embodiments, the method includes using primers to amplify, e.g., by 
PCR, LCR (ligase chain reaction), or other amplification methods, a region of the 
endogenous retroviral genome, and comparing the structure of the amplification product to 
30 the sequence of SEQ ID NO: I or its complement, SEQ ID NO:2 or its complement, or SEQ 
ID NO:3 or its complement to determine if there is difference in sequence between 
retroviral genome and SEQ ID NO: 1 or its complement, SEQ ID NO:2 or its complement, 
or SEQ ID NO:3 or its complement. The method further includes determining if one or 
more restriction sites exist in the endogenous retroviral genome, and determining if the 
35 sites exist in SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID 
NO:3 or its complement. 

In preferred embodiments, the mutation is a gross defect, e.g.. an insertion, 
inversion, translocation or a deletion, of all or part of the retroviral genome. 
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In preferred embodiments, detecting the mutation can include: (i) providing a 
labeled PCR probe amplified from DNA (e.g., SEQ ID NO: I. SEQ ID NO: 2. or SEQ ID 
NO: 3) containing a porcine retroviral nucleotide sequence which hybridizes to a sense or 
antisense sequence from the porcine retroviral genome(e.g., SEQ ID NO: 1 SEQ ID NO 2 
or SEQ ID NO: 3), or naturally occurring mutants thereof; (ii) exposing the probe/primer to 
nucleic acid of the tissue (e.g., genomic DNA) digested with a restriction endonuclease; and 
(iii) detecting by m situ hybridization of the probe/primer to the nucleic acid, the presence 
or absence of the genetic lesion. Alternatively, direct PCR analysis, using primers specific 
for porcine retroviral genes (e.g., genes comprising the nucleotide sequence shown in SEO 
ID NO: 1, SEQ ID NO: 2, or SEQ ID NO: 3), can be used to detect the presence or absence 
of the genetic lesion in the porcine retroviral genome by comparing the products amplified. 

In another aspect, the invention features a method of providing a miniature swine or 
a swine free of an endogenous retrovirus or retroviral sequence, e.g.. activatable retrovirus, 
insertion at a preselected site. The method includes: 

performing a breeding cross between a first miniature swine (or swine) having a 
retroviral insertion at the preselected site and a second miniature swine (or swine) not 
having a retroviral insertion at a preselected site, e.g., the same site, and recovering a 
progeny miniature swine (or swine), not having the insertion, wherein the presence or 
absence of the retroviral insertion is determined by contacting the genome of a miniature 
swine(or swine) with a sequence which can specifically hybridize to a porcine retroviral 
sequence; a sequence which can specifically hybridize to the sequence of SEQ ID NO: 1 or 
its complement; a sequence which can specifically hybridize to the sequence of SEQ ID 
NO:2 or its complement; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:3 or its complement; a nucleic acid of at least Inconsecutive nucleotides of 
sense or antisense sequence which encodes a gag protein; a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence from nucleotides 2452-4839 (e.e. 
from nucleotides 31 12-4683) of SEQ ID NO : 1, nucleotides 598-2169 (e.g, from 
nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, from nucleotides 
585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; 

a nucleic acid of at least 1 0 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO: 1, nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring 
mutants thereof; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
sequence which encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides 
of sense or antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of 
SEQ ID NO:l. nucleotides 4738-6722 (e.g. from nucleotides 4738-6722) of SEQ ID NO 2 
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or nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof: a swine 
or miniature swine retroviral nucleic acid; or a Tsukuba nucleic acid. 

In preferred embodiments, the nucleic acid is hybridized to nucleic acid, e.g., DNA 
from the genome, of the first animal or one of its ancestors. 
5 In preferred embodiments, the nucleic acid is hybridized to nucleic acid, e.g., DNA 

from the genome, of the second animal or one of its ancestors. 

In preferred embodiments, the nucleic acid is hybridized to nucleic acid, e.g., DNA 
from the genome, of the progeny animal or one of its descendants. 

In preferred embodiments, the nucleic acid has at least 60%, 70%, 72% , more 
10 preferably at least 85%, more preferably at least 90%, more preferably at least 95%, most 
preferably at least 98%, 99% or 100% sequence identity or homology with a sequence from 
SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its 
complement. 

In other preferred embodiments: the nucleic acid is at least 10, more preferably at 
1 5 least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 1000, 2000, 
4000. 6000. or 8060 nucleotides in length; the nucleic acid is less than 15, more preferably 
less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000, 4000, 6000, or 8060 
nucleotides in length; the nucleic acid is a full length retroviral genome. 

In another aspect, the invention features a method of evaluating a treatment, e.g., an 
20 immunosuppressive treatment, for the ability to activate a retrovirus, e.g., an endogenous 
porcine retrovirus. The method includes: 

administering a treatment to a subject, e.g., a miniature swine (or a swine), having 
an endogenous porcine retrovirus; and 

detecting expression of the porcine retrovirus witrva purified nucleic acid sequence 
25 which specifically hybridizes to the sequence of SEQ ID NO: 1 or its complement, SEQ ID 
NO:2 or its complement, or SEQ ID NO:3 or its complement. 

In preferred embodiments, the immunosuppresive treatment includes radiation, 
chemotherapy or drug treatment. 

In preferred embodiments: the treatment is one which can induce immunological 
30 tolerance; the treatment is one which can introduce new genetic material, e.g., introduce 
new genetic material into a miniature swine genome (or a swine genome) or into the 
genome of a host which receives a swine or a miniature swine graft, e.g., the treatment is 
one which introduces a new genetic material via retroviral mediated transfer. 

In a preferred embodiment: the purified nucleic acid is a Tsukuba- 1 retroviral 
35 sequence, probe or primer, e.g., as described herein; the purified nucleic acid is a porcine 
retroviral sequence, probe or primer, e.g., as described herein; the purified nucleic acid is 
the sequence of SEQ ID NO: I or its complement. SEQ ID NO:2 or its complement, or SEQ 
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ID NO:3 or its complement, or a fragment of such sequence or complement at least 10. 20 
or 30, basepairs in length. 

In preferred embodiments, the purified nucleic acid has at least 60%, 70%. 72%, 
more preferably at least 85%, more preferably at least 90%, more preferably at least 95%, 
5 most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
from SEQ ID NO:l or its complement SEQ ID NO:2 or its complement or SEQ ID NO:3 
or its complement. 

In other preferred embodiments: the purified nucleic acid is at least 1 0, more 
preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 
1 0 1 000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 1 5, 

more preferably less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000, 4000, 
6000, or 8060 nucleotides in length; the purified nucleic acid is a full length retroviral 
genome. 

In preferred embodiments the second nucleic acid is: a nucleic acid of at least 10 

15 consecutive nucleotides of sense or antisense sequence which encodes a gag protein; a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
nucleotides 2452-4839 (e.g, from nucleotides 3 1 1 2-4683) of SEQ ID NO:l, nucleotides 
598-2169 (e.g, from nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, 
from nucleotides 585-21 56) of SEQ ID NO:3, or naturally occurring mutants thereof; 

20 a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a poi protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO.l, nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring 
mutants thereof; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 

25 sequence which encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides 
of sense or antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1 999) of 
SEQ ID NO:i, nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, 
or nucleotides 5620-7533 of SEQ ID NO: 3, or naturally occurring mutants thereof. 

In another aspect, the invention features a method of localizing the origin of a 

30 porcine retroviral infection. The method includes: 

contacting a target nucleic acid from the graft with a second sequence chosen from 
the group of: a sequence which can specifically hybridize to a porcine retroviral sequence; 
a sequence which can specifically hybridize to the sequence of SEQ ID NO:l or its 
complement; a sequence which can specifically hybridize to the sequence of SEQ ID NO:2 

35 or its complement; a sequence which can specifically hybridize to the sequence of SEQ ID 
NO:3 or its complement; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence which encodes a gag protein; a nucleic acid of at least 10 consecutive 
nucleotides of sense or antisense sequence from nucleotides 2452-4839 (e.g, from 
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nucleotides 31 12-4683) of SEQ ID NO: I. nucleotides 598-2169 (e.g. from nucleotides 
598-2 i 69) of SEQ ID NO:2, or nucleotides 585-2 1 56 (e.g, from nucleotides 585-2 1 56) of 
SEQ ID NO:3, or naturally occurring mutants thereof; a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence which encodes a pol protein; a 
5 nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
nucleotides 4871-8060 of SEQ ID NO: 1, nucleotides 2320-4737 of SEQ ID NO:2, or 
nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 

10 antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID 
NO:l, nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or 
nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; a swine or 
miniature swine retroviral nucleic acid; or a Tsukuba nucleic acid contacting a target 
nucleic acid from the recipient with a second sequence chosen from the group of: a 

1 5 sequence which can specifically hybridize to a porcine retroviral sequence; a sequence 
which can specifically hybridize to the sequence of SEQ ID NO:l or its complement; a 
sequence which can specifically hybridize to the sequence of SEQ ID NO:2 or its 
complement; a sequence which can specifically hybridize to the sequence of SEQ ID NO:3 
or its complement; a nucleic acid of at least 10 consecutive nucleotides of sense or 

20 antisense sequence which encodes a gag protein; a nucleic acid of at least 10 consecutive 
nucleotides of sense or antisense sequence from nucleotides 2452-4839 (e.g, from 
nucleotides 31 12-4683) of SEQ ID NO:l, nucleotides 598-2169 (e.g, from nucleotides 
598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, from nucleotides 585-2156) of 
SEQ ID NO: 3, or naturally occurring mutants thereof; a nucleic acid of at least 10 

25 consecutive nucleotides of sense or antisense sequence which encodes a pol protein; a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
nucleotides 487 1 -8060 of SEQ ID NO: 1 , nucleotides 2320-4737 of SEQ ID NO:2, or 
nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring mutants thereof; a nucleic 
acid of at least 10 consecutive nucleotides of sense or antisense sequence which encodes a 

30 env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID NO:L 
nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or nucleotides 
5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; a swine or miniature 
swine retroviral nucleic acid; or a Tsukuba nucleic acid; hybridization to the nucleic acid 

35 from the graft correlates with the porcine retroviral infection in the graft; and hybridization 
to the nucleic acid from the recipient correlates with the porcine retroviral infection in the 
recipient. 
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In preferred embodiments, the target nucleic acid includes: genomic DNA, RNA or 
cDNA, e.g., cDNA made from an RNA template. 

In a preferred embodiment: the second nucleic acid is a porcine retroviral sequence, 
probe or primer, e.g., as described herein, e.g., a Tsukuba-1 retroviral sequence; the second 
5 nucleic acid is a sequence of SEQ ID NO: 1 or its complement, SEQ ID NO:2 or its 
complement, or SEQ ID NO:3 or its complement, or a fragment of the sequence or 
complement at least 10, 20, or 30, basepairs in length. 

In preferred embodiments, the recipient is an animal, e.g., a miniature swine, a 
swine, a non-human primate, or a human. 
1 0 In preferred embodiments, the graft is selected from the group consisting of: heart, 

lung, liver, bone marrow or kidney. 

In preferred embodiments, the second nucleic acid has at least 60%, 70%, 72%, 
more preferably at least 85%, more preferably at least 90%, more preferably at least 95%, 
most 

1 5 preferably at least 98%, 99% or 100% sequence identity or homology with a sequence from 
SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its 
complement. 

In other preferred embodiments: the second nucleic acid is at least 10, more 
preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100. 
20 1000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 15, 

more preferably less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000, 4000, 
6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 
genome. 

In another aspect, the invention features a method of screening a cell, e.g., a cell 
25 having a disorder, e.g., a proliferative disorder, e.g., a tumor ceil, e.g., a cancer cell, e.g., a 
lymphoma or a hepatocellular carcinoma, developing in a graft recipient, e.g., a xenograft, 
for the presence or expression of a porcine retrovirus or retroviral sequence. The method 
includes: 

contacting a target nucleic acid from a tumor cell with a second sequence chosen 
30 from the group of: a sequence which can specifically hybridize to a porcine retroviral 

sequence; a sequence which can specifically hybridize to the sequence of SEQ ID NO:l or 
its complement; a sequence which can specifically hybridize to the sequence of SEQ ID 
NO:2 or its complement; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:3 or its complement; a nucleic acid of at least 10 consecutive nucleotides of 
35 sense or antisense sequence which encodes a gag protein; a nucleic acid of at least 1 0 

consecutive nucleotides of sense or antisense sequence from nucleotides 2452-4839 (e.g, 
from nucleotides 31 12-4683) of SEQ ID NO:l, nucleotides 598-2169 (e.g, from 
nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, from nucleotides 
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585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; a nucleic acid of at 
least 1 0 consecutive nucleotides of sense or antisense sequence which encodes a pol 
protein; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence 
from nucleotides 487 1 -8060 of SEQ ID NO: 1 , nucleotides 2320-4737 of SEQ IDNO:2,or 
5 nucleotides 2307-574 1 of SEQ ID NO:3, or naturally occurring mutants thereof; a nucleic 
acid of at least 10 consecutive nucleotides of sense or antisense sequence which encodes a 
env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID NO: I. 
nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or nucleotides 

10 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; a swine or miniature 
swine retroviral nucleic acid; or a Tsukuba nucleic acid, under conditions in which the 
sample and the nucleic acid sequence can hybridize, hybridization being indicative of the 
presence of the endogenous porcine retrovirus or retroviral sequence in the tumor cell. 
In preferred embodiments, the target nucleic acid from a tumor cell includes: 

1 5 genomic DNA, RNA or cDNA, e.g., cDNA made from an RNA template. 

In a preferred embodiment: the second nucleic acid is a porcine retroviral sequence, 
probe or primer, e.g., as described herein, e.g., a Tsukuba- 1 retroviral sequence; the second 
nucleic acid is a sequence of SEQ ID NO:l or its complement, SEQ ID NO:2 or its 
complement, or SEQ ID NO:3 or its complement, or a fragment of the sequence or 

20 complement at least 10, 20, or 30, basepairs in length. 

In preferred embodiments, the second nucleic acid has at least 60%, 70%, 72%, 
more preferably at least 85%, more preferably at least 90%, more preferably at least 95%, 
most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
from SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 

25 or its complement. 

In other preferred embodiments: the second nucleic acid is at least 1 0, more 
preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50. 100, 
1000, 2000, 4000, 6000. or 8060 nucleotides in length; the nucleic acid is less than 15, 
more preferably less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000, 4000. 

30 6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 
genome. 

In another aspect, the invention features a method of screening a human subject for 
the presence or expression of an endogenous porcine retrovirus or retroviral sequence 
comprising: 

35 contacting a target nucleic acid derived from the human subject with a second 

sequence chosen from the group of: a sequence which can specifically hybridize to a 
porcine retroviral sequence; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:l or its complement; a sequence which can specifically hybridize to the 
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sequence of SEQ ID NO:2 or its complement; a sequence which can specifically hybridize 
to the sequence of SEQ ID NO:3 or its complement; a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence which encodes a gag protein; a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
5 nucleotides 2452-4839 (eg, from nucleotides 31 12-4683) of SEQ ID NO: 1, nucleotides 

598-2169 (e.g, from nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e g, 
from nucleotides 585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 

10 antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l, nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3 ? or naturally occurring 
mutants thereof; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
sequence which encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides 
of sense or antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of 

15 SEQ ID NO:U nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, 
or nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; a swine 
or miniature swine retroviral nucleic acid; or a Tsukuba nucleic acid under conditions in 
which the sequences can hybridize, hybridization being indicative of the presence of the 
endogenous porcine retrovirus or retroviral sequence in the human subject. 

20 In preferred embodiments, the target nucleic acid derived from a human subject is 

DNA, RNA or cDNA sample, nucleic acid from a blood sample or a tissue sample, e.g., a 
tissue biopsy sample. 

In preferred embodiments, the human subject is a miniature swine or swine 
xenograft recipient, or a person who has come into contact with a miniature swine or swine 

25 xenograft recipient. 

In preferred embodiments, the second nucleic acid has at least 60%, 70%, 72%, 
more preferably at least 85%, more preferably at least 90%, more preferably at least 95%, 
most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
from SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 

30 or its complement. 

In other preferred embodiments: the second nucleic acid is at least 10, more 
preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 
1000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 15. 
more preferably less than 20, most preferably less than 25, 30, bO, 100, 1000, 2000, 4000, 

35 6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 
genome. 

In preferred embodiments: the recipient is tested for the presence of porcine 
retroviral sequences prior to implantation of swine or miniature swine tissue. 
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In another aspect, the invention features a method of screening for viral mutations 
which modulate, e.g., increase or decrease, susceptibility of a porcine retrovirus to an 
antiviral agent, e.g., an antiviral antibiotic. The method includes: 

administering a treatment, e.g., an antiviral agent, e.g., an antiviral antibiotic; 
5 isolating a putative mutant porcine retroviral strain; 

determining a structure of the putative mutant retroviral strain; and 

comparing the structure to SEQ ID NO: I or its complement, SEQ ID NO:2 or its 
complement or SEQ ID NO:3 or its complement. 

In another aspect, the invention features a method of screening for viral mutations 
10 which modulate, e.g., increase or decrease, susceptibility of a porcine retrovirus to an 
antiviral agent, e.g., an antiviral antibiotic. The method includes: 

growing the porcine retrovirus in a presence of a treatment, e.g., an antiviral agent, 
e.g., an antiviral antibiotic; and 

determine the amount of porcine retroviral DNA synthesized by hybridizing the 
1 5 porcine retroviral DNA to a second sequence chosen from the group of: a sequence which 
can specifically hybridize to a porcine retroviral sequence; a sequence which can 
specifically hybridize to the sequence of SEQ ID NO:l or its complement; a sequence 
which can specifically hybridize to the sequence of SEQ ID NO:2 or its complement; a 
sequence which can specifically hybridize to the sequence of SEQ ID NO:3 or its 
20 complement; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 

sequence which encodes a gag protein; a nucleic acid of at least 10 consecutive nucleotides 
of sense or antisense sequence from nucleotides 2452-4839 (e.g, from nucleotides 31 12- 
4683) of SEQ ID NO: 1, nucleotides 598-2169 (e.g, from nucleotides 598-2169) of SEQ ID 
NO:2, or nucleotides 585-2156 (e.g, from nucleotides 585^2156) of SEQ ID NO:3, or 
25 naturally occurring mutants thereof; a nucleic acid of at least 10 consecutive nucleotides 
of sense or antisense sequence which encodes a pol protein; a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence from nucleotides 4871-8060 of 
SEQ ID NO:l, nucleotides 2320-4737 of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ 
ID NO:3, or naturally occurring mutants thereof; a nucleic acid of at least 10 consecutive 
30 nucleotides of sense or antisense sequence which encodes a env protein; a nucleic acid of 
at least 10 consecutive nucleotides of sense or antisense sequence from nucleotides 2-1999 
(e.g, from nucleotides 86-1999) of SEQ ID NO:l, nucleotides 4738-6722 (e.g, from 
nucleotides 4738-6722) of SEQ ID NO:2, or nucleotides 5620-7533 of SEQ ID NO:3, or 
naturally occurring mutants thereof; a swine or miniature swine retroviral nucleic acid; or a 
35 Tsukuba nucleic acid. 

In preferred embodiments, the method further includes amplifying the porcine 
retroviral nucleic acid with primers which specifically hybridize to the sequence ofSFQ [D 
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NO: 1 or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its 
complement, e.g., by polymerase chain reaction quantitative DNA testing (PDQ). 

In a preferred embodiment: the second nucleic acid is a Tsukuba-I retroviral 
sequence, probe or primer, e.g., as described herein; the second nucleic acid is a porcine 
5 retroviral sequence, probe or primer, e.g., as described herein;the second nucleic acid is the 
sequence of SEQ IDNO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID 
NO: 3 or its complement. 

In preferred embodiments, the second nucleic acid has at least 60%, 70%, 72%. 
more preferably at least 85%. more preferably at least 90%, more preferably at least 95%. 
10 most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
from SEQ ID NO: 1 or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 
or its complement. 

In other preferred embodiments: the second nucleic acid is at least 10, more 
preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 
15 1000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 1 5, 

more preferably less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000. 4000, 
6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 
genome. 

In another aspect, the invention features a method for screening a porcine-derived 

20 product for the presence or expression of a swine or miniature swine retrovirus or retroviral 
sequence , e.g., an endogenous miniature swine retrovirus. The method includes: 

contacting a target nucleic acid from the porcine-derived product with a second 
sequence chosen from the group of: a sequence which can specifically hybridize to a 
porcine retroviral sequence; a sequence which can specifically hybridize to the sequence of 

25 SEQ ID NO:l or its complement; a sequence which can specifically hybridize to the 

sequence of SEQ ID NO:2 or its complement; a sequence which can specifically hybridize 
to the sequence of SEQ ID NO:3 or its complement; a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence which encodes a gag protein; a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 

30 nucleotides 2452-4839 (e.g, from nucleotides 3 1 1 2-4683) of SEQ ID NO: I, nucleotides 

598-2169 (e.g, from nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, 
from nucleotides 585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol prolcin, a nucleic acid of at least 10 consecutive nucleotides of sense or 

35 antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l. nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurnna 
mutants thereof; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
sequence which encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides 
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of sense or antiscnse sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of 
SEQ ID NO:L nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, 
or nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; a swine 
or miniature swine retroviral nucleic acid; or a Tsukuba nucleic acid, under conditions in 
5 which hybridization can occur, hybridization being indicative of the presence or expression 
of an endogenous miniature swine or swine retrovirus or retroviral sequence s in the 
porcine-derived product. 

In preferred embodiments the product is: a protein product, e.g., insulin; a food 
product; or a cellular transplant, e.g., a swine or miniature swine cell which is to be 

10 transplanted into a host, e.g., a swine or miniature swine cell which is genetically 
engineered to express a desired product, 

In preferred embodiments, the method further includes amplifying the target nucleic 
acid with primers which specifically hybridize to the sequence of SEQ ID NO: 1 or its 
complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement. 

1 5 In other preferred embodiments, the target nucleic acid is: DNA; RNA; or cDNA. 

In preferred embodiments, the second nucleic acid has at least 60%, 70%, 72%, 
more preferably at least 85%, more preferably at least 90%, more preferably at least 95%, 
most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
from SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 

20 or its complement. 

In other preferred embodiments: the second nucleic acid is at least 10, more 
preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 
1000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 1 5, 
more preferably less than 20, most preferably less than 25, iQ, 50, 100, 1000, 2000, 4000, 

25 6000, or 8060 nucleotides in length; the second nucleic acid is a full length retroviral 
genome. 

In another aspect, the invention features a transgenic miniature swine or swine 
having a transgenic element, e.g., a base change, e.g., a change from A to G, or an insertion 
or a deletion of one or more nucleotides at an endogenous porcine retroviral insertion site, 
30 e.g., a retroviral insertion which corresponds to the retroviral genome of SEQ ID NO: 1 or 
its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement. 

In preferred embodiments, the transgenic element is a knockout, e.g., a deletion, 
insertion or a translocation, of one or more nucleic acids, which alters the activity of the 
endogenous porcine retrovirus. 
35 In another aspect, the invention features a method of inhibiting expression of an 

endogenous porcine retrovirus, including: inserting a mutation, e.g. a deletion into the 
endogenous retrovirus. 

In preferred embodiments, the endogenous porcine retrovirus is inactivated. 
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In preferred embodiments, the mutation can be a point mutation, an inversion, 
translocation or a deletion of one or more nucleotides of SEQ ID NO:l or its complement, 
SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement. 

In another aspect, the invention features a method of detecting a recombinant virus 
5 or other pathogen, e.g., a protozoa or fungi. The method includes: 

providing a pathogen having porcine retroviral sequence; and 

determining if the pathogen includes non-porcine retroviral sequence, the presence 
of non-porcine retroviral sequence being indicative of viral recombination. 

In preferred embodiments, the method further includes determining the structure of 
10 a retrovirus by comparing the retrovirus sequence with sequence of SEQ ID NO: 1 or its 
complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement, a 
difference being indicative of viral recombination. 

In preferred embodiments, the method further includes comparing the structure of 
the retrovirus with a human retroviral sequence, e.g., HTLV1, HIV1, or I IIV2. a similarity 
15 in structure being indicative of viral recombination. 

In another aspect, the invention features a method of determining the copy number, 
size, or completeness of a porcine retrovirus or retroviral sequence , e.g., in the genome of a 
donor, recipient or a graft. The method includes: 

contacting a target nucleic acid from the donor, recipient or a graft, with a second 
20 sequence chosen from the group of: a sequence which can specifically hybridize to a 

porcine retroviral sequence; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:l or its complement; a sequence which can specifically hybridize to the 
sequence of SEQ ID NO:2 or its complement; a sequence which can specifically hybridize 
to the sequence of SEQ ID NO:3 or its complement; a nucleic acid of at least 10 
25 consecutive nucleotides of sense or antisense sequence which encodes a gag protein; a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
nucleotides 2452-4839 (e.g, from nucleotides 3 1 12-4683) of SEQ ID NO: 1, nucleotides 
598-2169 (e.g, from nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-21 56 (e.g, 
from nucleotides 585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; 
30 a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l, nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring 
mutants thereof; 

35 a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID 
NO:l, nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or 
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nucleotides 5620-7533 of SEQ ID NO:3. or naturally occurring mutants thereof; a swine or 
miniature swine retroviral nucleic acid; or a Tsukuba nucleic acid. 

In preferred embodiments, the method further includes amplifying the porcine 
retroviral nucleic acid with primers which specifically hybridize to the sequence of SEQ ID 
5 NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its 

complement, e.g., by polymerase chain reaction quantitative DNA testing (PDQ) or nested 
PCR. 

In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 
from a miniature swine; RNA or cDNA, e.g., cDNA made from an RNA template, isolated 
1 0 from a miniature swine; DNA, RNA or cDNA, e.g., cDNA made from an RNA template, 
isolated from a miniature swine organ, e.g., a kidney; RNA, DNA or cDNA, e.g., cDNA 
made from an RNA template, isolated from a miniature swine organ which has been 
transplanted into a organ recipient, e.g., a xenogeneic recipient, e.g., a primate, e.g., a 
human. 

15 In preferred embodiments, the target nucleic acid includes: genomic DNA isolated 

from a swine; RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 
swine; DNA, RNA or cDNA, e.g., cDNA made from an RNA template, isolated from a 
swine organ, e.g., a kidney; RNA, DNA or cDNA, e.g., cDNA made from an RNA 
template, isolated from a swine organ which has been transplanted into a organ recipient, 

20 e.g., a xenogeneic recipient, e.g., a primate, e.g., a human. 

In preferred embodiments, the second nucleic acid has at least 60%, 70%, 72%, 
more preferably at least 85%, more preferably at least 90%. more preferably at least 95%, 
most preferably at least 98%, 99% or 100% sequence identity or homology with a sequence 
from SEQ ID NO: 1 or its complement, SEQ ID NO:2 or itsjcomplement, or SEQ ID NO:3 

25 or its complement. 

In other preferred embodiments: the second nucleic acid is at least 1 0, more 
preferably at least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 
1000, 2000, 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 15, 
more preferably less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000, 4000, 

30 6000, or 8060 nucleotides in length: the second nucleic acid is a full length retroviral 
genome. 

In another aspect, the invention features a method for screening a tissue, e.g., a 
cellular or tissue transplant, e.g.. a xenograft, or a tissue from a graft recipient, for the 
presence or expression of a swine or a miniature swine retroviral sequence, e.g., an 
35 endogenous miniature swine retrovirus. The method includes: contacting a tissue sample 

with an antibody specific for a retroviral protein, e.g., an anti-gag, pol, or env antibody, and 
thereby determining if the sequence is present or expressed. 
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In preferred embodiments the protein is encoded by a sequence from: the sequence 
of SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or 
its complement. 

In preferred embodiments, the tissue is selected from the group consisting of: heart, 
lung, liver, bone marrow, kidney, brain cells, neural tissue, pancreas or pancreatic cells, 
thymus, or intestinal tissue. 

A "purified preparation" or a "substantially pure preparation" of a polypeptide as 
used herein, means a polypeptide which is free from one or more other proteins, lipids, and 
nucleic acids with which it naturally occurs. Preferably, the polypeptide, is also separated 
from substances which are used to purify it, e.g., antibodies or gel matrix, such as 
polyacrylamide. Preferably, the polypeptide constitutes at least 10, 20, 50 70, 80 or 95% 
dry weight of the purified preparation. Preferably, the preparation contains: sufficient 
polypeptide to allow protein sequencing; at least 1 , 10, or 1 00 jag of the polypeptide; at 
least I, 10, or 100 mg of the polypeptide. 

Specifically hybridize, as used herein, means that a nucleic acid hybridizes to a 
target sequence with substantially greater degree than it does to other sequences in a 
reaction mixture. By substantially greater means a difference sufficient to determine if the 
target sequence is present in the mixture. 

A "treatment", as used herein, includes any therapeutic treatment, e.g., the 
administration of a therapeutic agent or substance, e.g., a drug or irradiation. 

A "purified preparation of nucleic acid", is a nucleic acid which is one or both of: not 
immediately contiguous with one or both of the coding sequences with which it is 
immediately contiguous (i.e., one at the 5' end and one at the 3 f end) in the naturally- 
occurring genome of the organism from which the nucleic acid is derived; or which is 
substantially free of a nucleic acid sequence or protein with which it occurs in the organism 
from which the nucleic acid is derived. The term includes, for example, a recombinant 
DNA which is incorporated into a vector, e.g., into an autonomously replicating plasmid or 
virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate 
molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction 
endonuclease treatment) independent of other DNA sequences. Substantially pure DNA 
also includes a recombinant DNA which is part of a hybrid gene encoding additional 
sequences. A purified retroviral genome is a nucleic acid which is substantially free of host 
nucleic acid or viral protein. 

"Homologous", as used herein, refers to the sequence similarity between two 
polypeptide molecules or between two nucleic acid molecules. When a position in both of 
the two compared sequences is occupied by the same amino acid or base monomer subunit, 
e.g., if a position in each of two DNA molecules is occupied by adenine, then the molecules 
are homologous at that position. The percent of homology between two sequences is a 
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function of the number of matching or homologous positions shared by the two sequences 
divided by the number of positions compared x 100. For example, if 6 of 10. of the 
positions in two sequences are matched or homologous then the two sequences are 60% 
homologous. By way of example, the DNA sequences ATTGCC and TATGGC share 50% 
5 homology. Generally, a comparison is made when two sequences are aligned to give 
maximum homology. The term sequence identity has substantially the same meaning. 

The term "provirus" or "endogenous retrovirus," as used herein, refers to an 
integrated form of the retrovirus. 

The terms "peptides 11 , "proteins", and "polypeptides" are used interchangeably 

10 herein. 

As used herein, the term "transgenic element" means a nucleic acid sequence, which 
is partly or entirely heterologous, i.e., foreign, to the animal or cell into which it is 
introduced but which is designed to be inserted, or is inserted, into the animal's genome in 
such a way as to alter the genome of the cell into which it is inserted. The term includes 

15 elements which cause a change in the sequence, or in the ability to be activated, of an 
endogenous retroviral sequence. Examples of transgenic elements include those which 
result in changes, e.g., substitutions (e.g., A for G), insertions or deletions of an 
endogenous retroviral sequence (or flanking regions) which result in inhibition of activation 
or misexpression of a retroviral product. 

20 As used herein, the term "transgenic cell" refers to a cell containing a transgenic 

element. 

As used herein, a "transgenic animal" is any animal in which one or more, and 
preferably essentially all, of the cells of the animal includes a transgenic element. The 
transgenic element can be introduced into the cell, directly_or indirectly by introduction into 

25 a precursor of the cell, by way of deliberate genetic manipulation, such as by- 
micro injection. This molecule may be integrated within a chromosome, or it may be 
extrachromosomally replicating DNA. 

As described herein, one aspect of the invention features a pure (or recombinant) 
nucleic acid which includes a miniature swine (or swine) retroviral genome or fragment 

30 thereof, e.g., nucleotide sequence encoding a gag-pol or env polypeptide, and/or 

equivalents of such nucleic acids. The term "nucleic acid", as used herein, can include 
fragments and equivalents. The term "equivalent" refers to nucleotide sequences encoding 
functionally equivalent polypeptides or functionally equivalent polypeptides which, for 
example, retain the ability to react with an antibody specific for a gag-pol or env 

35 polypeptide. Equivalent nucleotide sequences will include sequences that differ by one or 
more nucleotide substitutions, additions or deletions, such as allelic variants, and will, 
therefore, include sequences that differ from the nucleotide sequence of gag, pol. or env 
shown in herein due to the degeneracy of the genetic code. 
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"Misexpression", as used herein, refers to a non-wild type pattern of gene 
expression, e.g.,porcine retroviral, e.g., Tsukuba-1 gene expression, e.g., gag, pol or env 
gene expression. It includes: expression at non-wild type levels, i.e., over or under 
expression; a pattern of expression that differs from wild type in terms of the time or stage 
at which the gene is expressed, e.g., increased or decreased expression (as compared with 
wild type) at a predetermined developmental period or stage; a pattern of expression that 
differs from wild type in terms of decreased expression (as compared with wild type ) in a 
predetermined cell type or tissue type; a pattern of expression that differs from wild type in 
terms of the splicing, size, amino acid sequence, post-translational modification, stability, 
or biological activity of the expressed porcine retroviral, e.g./Isukuba-L polypeptides; a 
pattern of expression that differs from wild type in terms of the effect of an environmental 
stimulus or extracellular stimulus on expression of the porcine retroviral, e.g., Tsukuba-1 
genes, e.g., a pattern of increased or decreased expression (as compared with wild type) in 
the presence of an increase or decrease in the strength of the stimulus. 

Methods of the invention can be used with swine or miniature swine. 
Endogenous retrovirus is a potential source of infection not always susceptible to 
conventional breeding practices. Many proviruses are defective and unable to replicate. 
Provirus, if intact, can be activated by certain stimuli and then initiate viral replication 
using the host's cellular mechanisms. Retroviral infection will often not harm the host cell. 
However, replication of virus may result in viremia. malignant transformation (e.g., via 
insertion of retroviral oncogenes), degeneration, or other insertional effects (e.g., gene 
inactivation). The effects of such infection may not emerge for many years. The spectrum 
of behavior of active lentiviral infection in humans is well described relative to HIV. These 
include AIDS, unusual infections and tumors, recombinant and other viruses, and antigenic 
variation which may prevent the generation of protective immunity by the infected host. 

Screening of animals will allow elimination of donors with active replication of 
known viruses. Inactive proviruses can be detected with genetic probes and removed or 
inactivated. These novel approaches will allow the identification and elimination of 
potential human pathogens derived from swine in a manner not possible in the outbred 
human organ donor population and, thus, will be important to the development of human 
xenotransplantation. 

The porcine retroviral sequences of the invention are also useful as diagnostic 
probes to detect activation of endogenous porcine retroviruses following transplantation 
and xenotransplantation of organs derived from swine or miniature swine. The porcine 
retroviral sequences of the invention also provide diagnostic tools necessary to assess the 
risks associated with transplantation of organs from swine or miniature swine into human 
recipients. These sequences are also useful for the longitudinal evaluation of retroviral 
activation in the human recipient of miniature swine-derived organs. 
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The practice of the present invention will employ, unless otherwise indicated. 

conventional techniques of cell biology, cell culture, molecular biology, transgenic biology. 

microbiology, recombinant DNA. and immunology, which are within the skill of the art. 

Such techniques are described in the literature. See, for example, Molecular Cloning A 
5 Laboratory Manual, 2nd Ed., ed. by Sambrook, Fritsch and Maniatis (Cold Spring Harbor 

Laboratory Press: 1989); DNA Cloning. Volumes I and II (D. N. Glover ed., 1985); 

Oligonucleotide Synthesis (M. J. Gait ed., 1984); Mullis et at. U.S. Patent No: 4.683 ,195; 

Nucleic Acid Hybridization (B. D. Hames & S. J. Higgins eds. 1 984); Transcription And 

Translation (B. D. Hames & S. J. Higgins eds. 1984); Culture Of Animal Cells (R. I. 
10 Freshney, Alan R. Liss, Inc., 1987); Immobilized Cells And Enzymes (IRL Press, 1986); B. 

Perbal, A Practical Guide To Molecular Cloning (1984); the treatise, Methods In 

Enzymology (Academic Press, Inc., N.Y.); Gene Transfer Vectors For Mammalian Cells (J. 

H. Miller and M. P. Calos eds., 1987, Cold Spring Harbor Laboratory); Methods In 

Eniymology. Vols. 154 and 155 (Wu et al. eds.). Immunochemical Methods In Cell And 
15 Molecular Biology (Mayer and Walker, eds.. Academic Press, London, 1987); Handbook 

Of Experimental Immunology, Volumes I-IV (D. M. Weir and C. C. Blackwell, eds., 1986); 

Manipulating the Mouse Embryo, (Cold Spring Harbor Laboratory Press, Cold Spring 

Harbor, N.Y., 1986). 

Although methods and materials similar or equivalent to those described herein can 
20 be used in the practice or testing of the present invention, the preferred methods and 
materials are described below. All publications mentioned herein are incorporated by 
reference. In addition, the materials, methods, and examples are illustrative only and not 
intended to be limiting. 

Detailed Description of the Drawings 

25 Figure 1 is the nucleotide sequence (SEQ ID NO: 1 ) of the Tsukuba-1 cDNA. 

Figure 2 is the nucleotide sequence (SEQ ID NO: 2) of a defective retroviral 
genome isolated from the retrovirus from the PK-15 cell line. 

Figure 3 is the nucleotide sequence (SEQ ID NO: 3) of a retrovirus found in 
miniature swine. 
30 Detailed Description 

Miniature Swine Retroviruses 

Transplantation may increase the likelihood of retroviral activation, if intact and 
infectious proviruses are present. Many phenomena associated with transplantation, e.g., 
immune suppression, graft rejection, graft- versus-host disease, viral co-infection, cytotoxic 
35 therapies, radiation therapy or drug treatment, can promote activation of retroviral 
expression. 

Many species are thought to carry retroviral sequences in their genomic DNA. The 
number of intact (complete) retroviral elements that could be activated is often unknown. 
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Once activated, swine-derived viruses would require the appropriate receptor on human 
tissues to spread beyond the transplanted organ. Most intact endogenous pro viruses 
(usually types B and C), once activated, are not pathogenic. However, coinfection with 
other viruses, recombination with other endogenous viruses, or modification of viral 
behavior in the foreign human environment may alter the pathogenicity, organ specificity 
or replication of the retroviruses or other infectious agents. 

The lack of sequence data on pig viruses has impeded efforts to assess the number 
of porcine sequences, or porcine retroviral sequences, that have incorporated into the 
human genome or the frequency of incorporation. 

The inventor, by showing that the Tsukuba-1 retrovirus is found in miniature swine, 
and by providing the entire sequence of the porcine retroviral (Tsukuba-1) genome, has 
allowed assessment of the risk of endogenous retroviruses in general clinical practice and 
more importantly in xenotransplantation. 

The porcine retroviral sequences of the invention can be used to determine the level 
(e.g., copy number) of intact (i.e., potentially replicating) porcine provirus sequences in a 
strain of xenograft transplantation donors. For example, the copy number of the miniature 
swine retroviral sequences can be determined by the Polymerase Chain Reaction DNA 
Quantitation (PDQ) method, described herein, or by other methods known to those skilled 
in the art. This quantitation technique will allow for the selection of animal donors, e.g., 
miniature swine donors, without an intact porcine retroviral sequence or with a lower copy 
number of viral elements. 

The porcine retroviral sequences of the invention can be used to determine if 
mutations, e.g., inversions, translocations, insertions or deletions, have occurred in the 
endogenous porcine retroviral sequence. Mutated viral genomes may be expression- 
deficient. For example, genetic lesions can be identified by exposing a probe/primer 
derived from porcine retrovirus sequence to nucleic acid of the tissue (e.g., genomic DNA) 
digested with a restriction endonucleases or by in situ hybridization of the probe/primer 
derived from the porcine retroviral sequence to the nucleic acid derived from donor, e.g., 
miniature swine, tissue. Alternatively, direct PCR analysis, using primers specific for 
porcine retroviral genes (e.g., genes comprising the nucleotide sequence shown in SEQ ID 
NO: 1, 2, or 3), can be used to detect the presence or absence of the genetic lesion in the 
porcine retroviral genome. 

Miniature swine retroviral sequences of the invention can also be use to detect viral 
recombinants within the genome, or in the circulation, cells, or transplanted tissue, between 
the porcine retrovirus and other endogenous human viruses or opportunistic pathogens (e.e. 
cytomegalovirus) of the immunocompromised transplant recipient. For example, pieces of 
the viral genome can be detected via PCR or via hybridization, e.g.. Southern or Northern 
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blot hybridization, using sequences derived from SEQ ID NO: I, 2. or 3 as primers for 
amplification or probes for hybridization. 

Miniature swine retroviral sequences of the invention, e.g., PGR primers, allow 
quantitation of activated virus. Sequences of the invention also allow histologic 
5 localization (e.g., by in situ hybridization) of activated retrovirus. Localization allows 

clinicians to determine whether a graft should be removed as a source of potential retroviral 
infection of the human host or whether the retroviral infection was localized outside the 
graft. Sequences of the invention, e.g., PCR primers, allow the detection of actively 
replicating virus, e.g., by using reverse transcribed PCR techniques known in the art. 

10 Standard techniques for reverse transcriptase measurements are often complicated, species- 
specific, and are of low sensitivity and specificity, and false positive results may develop 
using full-length probes for Southern and Northern molecular blotting. Sequences of the 
invention allow for sensitive and specific assays for the activation of virus and this will 
allow performance of a wide variety of tests, some of which are outlined below. 

15 The invention provides for the testing and development of donor animals having 

reduced numbers of intact proviral insertions. It also provides for the testing of 
immunosuppressive regimens less likely to provide the conditions for active replication of 
retrovirus. Conditions likely to activate one retrovirus are generally more likely to activate 
other viruses including unknown retroviruses and known human pathogens including 

20 cytomegalovirus, hepatitis B and C viruses, Human Immunodeficiency Viruses (I and II). 

Given the availability of preventative therapies for these infections, these therapies could be 
used prophylactically in patients known to be susceptible to the activation of porcine 
retrovirus. 

The miniature swine retroviral sequences of the indention can be used to measure 
25 the response of the miniature swine retroviral infection in humans to therapy, e.g., 

immunomodulatory or antiviral therapy, e.g., antiviral agents, e.g., antiviral antibiotics. 
With HIV, susceptibility to antiviral antibiotics is determined by the genetic sequence of 
the reverse transcriptase gene (RT pol region) and other genes. The ability to determine the 
exact sequence of the retroviral genes will allow the detection of mutations occurring 
30 during infection which would then confer resistance of this virus to antiviral agents. 

Primers, e.g., for the RT-pol region, of the invention can be used to detect and to sequence 
clinical viral isolates from patients which have developed mutations by PDQ method 
described herein. The primers of the invention can also be used to determine whether 
tumor cells, e.g., cancer cells, e.g. lymphoma or hepatocellular carcinoma, developing in 
35 xenograft recipients contain porcine retroviral elements. 

The porcine retroviral sequences of the invention can also be used to detect other 
homologous retroviruses and to determine whether these are the same or different as 
compared to the Tusukuba-l retroviral sequences. For example, within a species, the 
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polymerase genes are highly conserved. PCR assays aimed at the gag-pol region followed 
by sequence analysis allow for this detection of homologous viruses. The appropriate 
regions of the Tsukuba-I virus can be determined by using sequences derived from SEQ ID 
NO: 1, described herein, to identify additional 5' and 3' viral genomic sequences. As is 
discussed elsewhere herein, the sequences from SEQ ID NO: I were used to obtain the 
sequence of the PK-15 retroviral insert (SEQ ID NO:2) and of a retroviral insertion in a 
miniature swine (SEQ ID NO:3). 

Miniature swine retroviral sequences of the invention can be used to screen donor 
animals and xenograft recipients after transplantation both for infection, and as a measure 
of the appropriate level of immune suppression, regarding susceptibility to infection. 
Physicians, medical staff, family, or individuals who come into contact with graft 
recipients, and others, can be screened for infection with virus derived from the xenograft 
recipient. Members of the population in general can also be screened. Such screening can 
be used for broad epidemiologic studies of the community. These methods can help in 
meeting the requirements of the F.D.A. regarding enhancing the safety of the recipients and 
of the community to exposure to new viruses introduced into the community by xenograft 
transplantation. 

As is shown in Suzuka et aL, 1986, FEBS 198:339, the swine retroviruses such as 
the Tsukuba-i genome can exist as a circular molecule. Upon cloning the circular molecule 
is generally cleaved to yield a linear molecule. As will be understood by one skilled in the 
art, the start point and end point of the resulting linear molecule, and the relative subregions 
of the viral sequence will of course vary with the point of cleavage. For example, in the 
Suzuka et al. reference the LTR is shown to be in an internal fragment. This is indicated 
herein in that the order of gag, pol, env in SEQ ID NO 1 is shown as env, gag, poK while 
elsewhere herein the order of these regions is given as the naturally occurring gag, pol, env 
order. 



Primers Derived from t he Porcine Retroviral (Tsukuba-U Genome Sequence 

A number of different primers useful in the methods of the invention have been 
described herein. One skilled in the art can identify additional primers from the viral 
sequence of SEQ ID NO: 1 by using methods known in the art. For example, when trying 
to identify potentially useful primers one skilled in the an would look for sequences 
(sequences should be between about 15 and 30 nucleotides in length) which hybridize to 
SEQ ID NO:l with high melting temperature; have a balanced distribution of nucleotides, 
e.g., a balanced distribution of A, T, C and Gs; have a terminal C or G; do not self- 
hybridize or internally complement. 

Use of Primers Derived f r om the Porcine Retroviral (Tsukuba-1 ) Genome ^n,, P nr, 
I. Testing of organs or cells prior to transplantation 
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Potential donor animals can be screened for active retroviral replication prior to 
being used in transplantation. This allows avoidance of animals undergoing active viral 
replication. Replicating virus is often infectious in 100% of recipients, while 
nonrcplicating, latent provirus generally causes infection in 5 to 25% of recipients. 
5 IL Testing of recipients 

Serial samples, e.g., of white blood cells, can be obtained from a graft recipient 
monthly, e.g., for the first month and every three months thereafter. Tissue biopsies 
obtained for evaluation of graft function can be used to evaluate the activation of retroviral 
sequences or of theexpression retroviral sequences in graft tissue. Samples can be screened 

I 0 for the presence of retrovirus infection both specifically for the homologous virus, for viral 
recombinants containing portions of the viral genome, and for other retroviruses, using, 
e.g., PCR primers for the pol region of the virus, which is the region most likely to be 
conserved. If virus is detected, quantitative PCR can be used to determine the relative 
stability of viral production. Cells isolated from xenograft recipients can be tested by 

1 5 cocultivation with permissive human and porcine (e.g., pig fallopian tube, pig macrophage, 
or pig testis) cell lines known to contain endogenous viruses. Isolated virus will be tested 
for homology with the parental strain and for mutations which might affect susceptibility to 
antiviral agents, e.g., antiviral antibiotics. 

III. Testing of surgical and medical personnel and family members of graft 
20 recipient 

Samples, e.g., white blood cells, can be banked (archived) from the surgical and 
medical personnel and from family members of the recipient prior to transplantation and at 
three months intervals for the first year and at least annually thereafter. Epidemiologic 
studies can be performed on these samples as well. These-samples can be tested if the 
25 recipient becomes viremic or if unusual clinical manifestations are noted in these 
individuals. 

IV. Testing of tumor cells 

Tumor ceils which develop from a graft, or a graft recipient, can be tested for the 
presence of active retrovirus and for proviruses. 
30 V. Testing of patients 

Patients can be retested for any significant change in clinical condition or for 
increased immune suppression of graft rejection which may be associated with an increased 
risk of viral activation. 

Sequencing of the porcine retroviral (Tsukuba-1) genome 
35 A clone (PX8.8) containing the 8060 bp Xhol porcine retrovirus (Tsukuba-1 ) insert 

was used to transfect competent E. coli, and DNA was isolated for sequencing. The 
strategy used to sequence the 8060 bp porcine retrovirus genome included a combination of 
procedures which are outlined below. 
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Random fragments^ 1 -3 kb) of the clone (PX8.8) were generated by sonication. The 
fragments were blunt-ended and were subcloned into the EcoRV site of the pBluescript SK 
vector. Plasmid DNA was prepared using a modified alkaline lysis procedure. DNA 
sequencing was performed using DyeDeoxy termination reactions (ABI). Base specific 
fluorescent dyes were used as labels. Sequencing reactions were analyzed on 4.75% 
polyacrylamide gels by an ABI 373A-S or 373S automated sequencer. Subsequent data 
analysis was performed on Sequencer™ 3.0 software. The following internal sequencing 
primers were synthesized: 



10 



15 



20 



25 



30 



35 



API 


5* 


GATGAACAGGCAGACATCTG 


3' 


(SEO ID 


AP2 


5* 


CGCTTACAGACAAGCTGTGA 


3' 


(SEQ ID NO:49) 


AP3 


5' 


AGAACAAAGGCTGGGAAAGC 


3' 


fSFO ID NO-Sfn 


AP4 


5' 


ATAGGAGACAGCCTGAACTC 


3' 


(SEO ID NO-51 ^ 


APS 


5' 


GGACCATTGTCTGACCCTAT 


3' 


(SEO ID NO'S^ 


AP6 


5' 


GTCAACACCTATACCAGCTC 


3* 


(SEQ ID NO:53) 


AP7 


5' 


CATCTGAGGTATAGCAGGTC 


3 j 


(SEQ ID NO:54) 


AP8 


5* 


GCAGGTGTAGGAACAGGAAC 


3' 


(SEQ ID NO:55) 


AP9 


5' 


ACCTGTTGAACCATCCCTCA 


3' 


(SEQ ID NO:56) 


APtO 


5' 


CGAATGGAGAGATCCAGGTA 


3' 


(SEQ ID NO:57) 


API 1 


5' 


CCTGC A TC A CTTCTCTTACC 


3' 


(SEQ ID NO:58) 


AP12 


5' 


TTGCCTGCTTGTGGAATACG 


3' 


(SEQ IDNO:59) 


AP13 


5' 


CAAGAGAAGAAGTGGGGAATG 3' 


(SEQ ID NO:60) 


AP14 


5' 


CACAGTCGTACACCACGCAG 


3* 


(SEQ ID NO:61) 


API 5 


5 


GGGAGACAGAAGAAGAAAGG3' 


(SEQ ID.NO:62) 


AP!6 


5' 


CGATAGTCATTAGTCCCAGG 


3' 


(SEQ ID NO:63) 


AP17 


5' 


TGCTGGTTTGCATCAAGACCG 3' 


(SEQ IDNO:64) 


AP18 


5' 


GTCGCAAAGGCATACCTGCT 


3' 


(SEQ IDNO:65) 


AP19 


5' 


ACAGAGCCTCTGCTAAGAAG 


3' 


(SEQ IDNO:66) 


AP20 


5' 


GCAGCTGTTGACAATCATC 


3* 


(SEQ IDNO:67) 


AP21 


5' 


TATGAGGAGAGGGCTTGACT 


3' 


(SEQ IDNO:68) 


AP22 


5' 


AGCAGACGTGCTAGGAGGT 


y 


(SEQ IDiiO:69) 


AP23 


5' 


TCCTCTTGCTGTTTGCATC 


y 


(SEQ IDNO.70} 


AP24 


5* 


CAGACACTCAGAACAGAGAC 


y 


(SEQ IDNO:7I) 


AP25 


5' 


ACATCGTCTAACCCACCTAG 


y 


(SEQ IDNO.72) 


AP26 


5' 


CTCGTTTCTGGTCATACCTGA 


y 


(SEQ ID NO:73) 


AP27 


5* 


GAGTACATCTCTCTAGGCA 


y 


(SEQ ID NO:74) 


AP28 


5' 


TGCCTAGAGACATGTACTC 


y 


(SEQ IDNO:4) 


AP29 


5* 


CCTCTTCTAGCCATTCCTTCA 


y 


(SEQIDNO:5) 



40 



45 



The clone (PA,8.8) containing the 8060 bp Xhol porcine retrovirus (Tsukuba-1) insert was 
deposited with ATCC on December 27, 1995 (ATCC Deposit No.97396). 
Determination of t he porcine retroviral (Tsukuba-1 ) copy number in a miniature swine 
Total genomic DNA was isolated from miniature swine kidney by the methods 
known in the art. The isolated genomic DNA was digested with either EcoRJ or Hindlll 
restriction enzyme. The DNA digests were electrophoresed on an agarose gel, Southern 
blotted and hybridized to the full-length, purified, Tsukuba-l sequence (SEQ ID NO:l ) 
under high stringency conditions (0.1 X SSC, 65°C). In both digested samples (EcoRJ or 
Hindlll) at least six copies of the high molecular fragments of the miniature swine genome 
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(over 16 Kb in size) hybridized to SEQ ID NO:l, indicating the presence of homologous 
retroviral sequences in porcine DNA. 

Susceptibility Testing by Polymerase Chain Reaction DNA Quantitation fPDQ) 

Polymerase chain reaction (PCR) DNA quantitation (PDQ) susceptibility testing 
5 can be used to rapidly and directly measure nucleoside sensitivity of porcine retrovirus 
isolates. 

PCR can be used to quantitate the amount of porcine retroviral RNA synthesized after in 
vitro infection of peripheral blood mononuclear cells. The relative amounts of porcine 
retroviral RNA in cell lysates from cultures maintained at different drug concentrations 

10 reflect drug inhibition of virus replication. With the PDQ method both infectivity titration 
and susceptibility testing can be performed on supernatants from primary cultures of 
peripheral blood mononuclear cells. 

The PDQ experiments can be performed essentially as described by Eron et ah, 
PNAS USA 89:3241-3245, 1992. Briefly, aiiquots (150^il) of serial dilutions of virus 

15 sample can be used to infect 2 x 10 6 PHA-stimulated donor PBMCs in 1.5 ml of growth 
medium per well of a flat-bottom 24-well plate (Corning). Separate cell samples can be 
counted, harvested, and lysed at 48, 72 and 96 hr. Quantitative PCR and porcine retrovirus 
copy-number determination can then be performed in duplicate on each lysate. 

The results of a PDQ infectivity titration assay can be used to determine the virus 

20 dilution and length of culture time employed in a subsequent PDQ susceptibility test. 
These parameters should be chosen so that the yield of porcine retrovirus specific PCR 
product for the untreated control infection would fall on the porcine retrovirus copy-number 
standard curve before the curve approached its asymptotic maximum, or plateau. PHA- 
stimulated donor PBMCs can be incubated with drug for 4_hr prior to infection. Duplicate 

25 wells in a 24-well plate should receive identical porcine retrovirus inocula for each drug 
concentration tested and for the untreated infected controls. Uninfected controls and drug 
toxicity controls should be included in each experiment. All cultures can be harvested and 
cells lysed for PCT after either 48 or 72 hr. Previously characterized isolates can be used as 
assay standards in each experiment. 

30 Cell pellets can be lysed in various volumes of lysis buffer (50 mM KC1/1 OmM Tris 

•HC1, pH 8.3/2.5 mM MgCl 2 /0.5% Nonidet P-40/0.5% Tween 20/0.01% proteinase K) to 
yield a concentration of 1.2 x 10 4 cell equivalents/jjT Uniformity to cell lysate DNA 
concentrations should be confirmed in representative experiments by enhancement of 
Hoechst 33258 fluorescence (Mini-Fluorometer, Hoefer). 

35 A conserved primer pair can be synthesized according to the pol gene sequences. 

The primers can than be used to amplify a 1580-base pair fragment of the porcine retrovirus 
pol gene from 1.2 x 10 5 cell equivalents of lysate by using PCR (GeneAmp, Cetus) under 
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standard conditions. Amplifications should be repeated if porcine retrovirus DNA is 
amplifiable from reagent controls. 

Porcine retrovirus pol gene amplification products can be specifically detected and 
quantitated as described (Conway, B.C. (1990) in Techniques in HIV Research (Aldovani 
& Walker, eds.) (Stockton, New York) pp.40-46). Heat-denatured PCR products can be 
hybridized in a Streptavidin-coated microtiter plate well with both biotinylated capture 
probe and horseradish peroxidase (HRP)-labeled detector probe (enzyme-linked 
oligonucleotide solution sandwich hybridization assay ((ELOSA), DuPont Medical 
Products, Billerica, MA) for 60 min at 37°C. After extensive washing to remove all 
reactants except probe-DNA hybrids, an HRP chromogen, tetramethyl benzidine (TMBlue, 
Transgenic Sciences, Worcester, MA), should be added to each well. The HRP-catalyzed ' 
color development should be stopped after 1 hr by addition of sulfuric acid to 0.65 M. 
Absorbance (OD) at 450 nm can be measured in an automated microtiter plate reader (SLT 
Labinstruments, Hillsborough, NC). 

A standard curve of porcine retrovirus DNA copy number can be generated in each 
PCR by using a dilution series of cells containing one porcine proviral genome per cell. 
Preparati on of a miniature swine having a knockout of Tsukuba-1 viral sequence usinv 
isogenic DNA targeting vectors 

Isogenic DNA, or DNA that is substantially identical in sequence between the 
targeting vector and the target DNA in the chromosomes, greatly increases the frequency 
for homologous recombination events and gene targeting efficiency. Using isogenic-DNA 
targeting vectors, targeting frequencies of 80% or higher can be achieved in mouse 
embryonic stem cells. This is in contrast to non-isogenic DNA vectors which normally 
yield targeting frequencies of around 0.5% to 5%. i.e., approximately two orders of 
magnitude lower than isogenic DNA vectors. Isogenic DNA constructs are predominantly 
integrated into chromosomes by homologous recombination rather than random integration. 
As a consequence, targeted mutagenesis of viral sequences, e.g., viral genes, can be carried 
out in biological systems including zygotes, which do not lend themselves to the use of 
elaborate selection protocols, resulting in production of animals, e.g.. miniature swine, free 
of, or having a reduced number of, activatable viral sequences. In order for the isogenic 
DNA approach to be feasible, targeting vectors should be constructed from a source of 
DNA that is identical to the DNA of the organism to be targeted. Ideally, isogenic DNA 
targeting is carried out in inbred strains of animals, e.g., inbred miniature swine, in which 
all genetic loci are homozygous. Any animal of that strain can serve as a source for 
generating isogenic targeting vectors. This protocol for isogenic gene targeting is outlined 
inTeRieleetal., PNAS 89:5128-5132, 1992 and PCT/US92/07 1 84, herein incorporated by 
reference. A protocol for producing Tsukuba-1 knockout miniature swine is described 
briefly below. 
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An insertion vector is designed as described by Hasty and Bradley (Gene Targeting 
Vectors for Mammalian Cells, in Gene Targeting: A Practical Approach, cd, Alexandra L. 
Joyner, IRL Press 1993). Insertion vectors require that only one crossover event occur for 
integration by homologous recombination into the native locus. The double strand breaks, 
5 the two ends of the vector which are known to be highly recombinogenic, are located on 
adjacent sequences on the chromosome. The targeting frequencies of such constructions 
will be in the range of 30 to 50%. One disadvantage of insertion vectors, in general, 
concerns the sequence duplications that are introduced and that potentially make the locus 
unstable. All these constructions are made using standard cloning procedures. 

10 Replacement vectors have also been extensively described by Hasty and Bradley. 

Conceptually more straight forward than the insertion vector, replacement vectors use an 
essentially co-linear fragment of a stretch of Tsukuba-1 genomic sequence. Preferablv, the 
DNA sequence from which an isogenic replacement vector is constructed includes 
approximately 6 to 10 kb of uninterrupted DNA. Two crossovers, one on either side of the 

1 5 selectable marker causes the mutant targeting vector to become integrated and replace the 
wild-type gene. 

Microinjection of the isogenic transgene DNA into one of the pronuclei of a porcine 
embryo at the zygote stage (one-cell embryo) is accomplished by modification of a protocol 
described earlier (Hammer et aL 1 985, Nature 3 1 5, 680; Pursel et al. 1 989, Science 244, 

20 1 28 1 ). The age and the weight of the donor pigs, e.g., haplotype specific mini-swine, are 
critical to success. Optimally, the animals are of age 8 to 10 months and weigh 70 to 85 
lbs. This increases the probability of obtaining an adequate supply of one-cell embryos for 
microinjection of the transgenes. In order to allow for accurate timing of the embryo 
collections at this stage from a number of embryo donors, the gilts are synchronized using a 

25 preparation of synthetic progesterone (Regumate). Hormone implants are applied to 

designated gilts 30 days prior to the date of embryo collection. Twenty days later, ten days 
prior to the date of collection, the implants are removed and the animals are treated with 
additional hormones to induce superovulation to increase the number of embryos for 
microinjection. Three days following implant removal, the animals are treated with 400 to 

30 1000 IU of pregnant mare serum gonadotropin (PMSG) and with 750 IU of human 

chorionic gonadotropin (hCG) three to four days later. These animals are bred by artificial 
insemination (AI) on two consecutive days following injection of hCG. 

Embryo collections are performed as follows: three days following the initial 
injection of hCG, the animals are anesthetized with an intramuscular injection of Telazol (3 

35 mg/lb), Rompum (2 mg/lb) and Atropine (1 mg/lb). A midline laparotomy is performed 
and the reproductive tract exteriorized. Collection of the zygotes is performed by 
cannulating the ampulla of the oviduct and flushing the oviduct with 10 to 1 5 ml phosphate 
buffered saline, prewarmed to 39° C. Following the collection the donor animals are 
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prepared for recovery from surgery according to USDA guidelines. Animals used twice for 
embryo collections are euthanized according to USDA guidelines. 

Injection of the transgene DNA into the pronuclei of the zygotes is carried out as 
summarized below: Zygotes are maintained in medium HAM F-12 supplemented with 
5 10% fetal calf serum at 38° C in 5% C0 2 atmosphere. For injection the zygotes are placed 
into BMOC-2 medium, centrifuged at 13,000 g to partition the embryonic Lipids and 
visualize the pronuclei. The embryos are placed in an injection chamber (depression slide) 
containing the same medium overlaid with light paraffin oil. Microinjection is performed 
on a Nikon Diaphot inverted-microscope equipped with Nomarski optics and Narishige 

10 micromanipulators. Using 40x lens power the embryos are held in place with a holding 
pipette and injected with a glass needle which is back-filled with the solution of DNA 
containing the transgenic element, e.g., a mutant viral gene (2 |ig/ml). Injection of 
approximately 2 picoliters of the solution (4 femptograms of DNA), which is equivalent to 
around 500 copies of the transgenic element, e.g., a mutant viral gene, is monitored by the 

1 5 swelling of the pronucleus by about 50%. Embryos that are injected are placed into the 
incubator prior to transfer to recipient animals. 

Recipient animals are prepared similarly to the donor animals, but not 
superovulated. Prior to the transfer of the injected embryos, recipient gilts are anesthetized, 
the abdomen opened surgically by applying a longitudinal incision and the ovaries 

20 exteriorized. The oviduct ipsilateral to the ovary with the larger number of corpus lutei is 
flushed, the embryos checked to evaluate if the animals is reproductive! y sound. 
Approximately 4 to 6 zygotes injected with the transgenic element, e.g., a mutant viral 
gene, are transferred to the flushed oviduct, the abdominal incision sutured and the animals 
placed in a warm area for recovery. The status of the pregnancy is monitored by ultrasound 

25 starting at day 25, or approximately one week following the expected date of implantation. 
Pregnant recipients are housed separately until they are due to farrow. 

Newborn piglets are analyzed for integration of the transgenic element into 
chromosomal DNA. Genomic DNA is extracted from an ear punch or a blood sample and 
initial screening is performed using PCR. Animals that are potentially transgenic element- 

30 positive are confirmed by Southern analysis. Transgenic founder animals are subjected to 
further analysis regarding the locus of transgenic element integration using Southern 
analysis. 

The isolation and sequencing of an endogenous swine retroviral insert and of a retroviral 
insert in porcine PK-15 cells 
35 Cloning of PK15 and PAL endogenous retroviruses 

I. Poly A+ RNA isolation 
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Peripheral blood lymphocytes (PBLs) were prepared from haplotype d/d minisvvine 
using standard protocols known in the art. The PBLs were cultured in the presence of 1% 
phytohemagglutinin (PHA) for about 84 hours. The activated PBLs were collected and 
total RNA was isolated using commercially available kits, such at Gentra's (Minneapolis. 
5 Minnesota) PUREscript Kit. Poly A+RNA was isolated from the total RNA using another 
commercially available product, Dynal Dynabeads (Lake Success, NY). Northern analysis 
of the RNA using a pig retroviral probe confirmed the presence of potentially full-length 
retroviral genome RNA. RNA from PK15 cells was isolated using similar protocols. 

10 II. Construction of the cDNA libraries 

Using Superscript Choice System (Life Technologies Ltd, Gibco BRL, 
Gaithersburg, MD) for cDNA Synthesis, a cDNA library was constructed using oligo dT to 
make the first strand cDNA. The use of Superscript reverse transcriptase was important in 
order to obtain full-length retroviral (RV) cDNAs, due to the length of the RV RNA. The 

15 cDNA library was enriched for large cDNA fragments by size selecting >4 kb fragments by 
gel electrophoresis. The cDNAs were cloned into Lambda ZAP Express (Clontech 
Laboratories, Inc. Palo Alto, CA), which is one of the few commercially available cDNA 
vectors that would accept inserts in the l-12kb range. 

20 III. Screening of the cDNA libraries 

0.75 - 1 .2 x 10^ independent clones were screened using either gag and pol or gag 
and env probes. Double positive clones were further purified until single isolates were 
obtained (1 or 2 additional rounds of screening). 

25 IV. Characterization of the clones 

Between 1 8 and 30 double positive clones were selected for evaluation. Lambda 
DNA was prepared using standard protocols, such as the Lambda DNA Kit (Qiagen Inc., 
Chatsworth, CA). The clones were analyzed by PCR to check for (a) RV genes, and (b) 
determine the size of insert and LTR regions. Restriction digests were also done to confirm 

30 the size of insert and to attempt to categorize the clones. Clones containing the longest 
inserts and having consistent and predicted PCR data were sequenced. 
Development of a PCR-based assav for the detection of the presence of an endogenous 
retrovirus in cells, tissues, organs, miniswine or recipient hosts (e.g., primates, humans) 

Using a commercially available computer software program (such as RightPrimer, 

35 Oligo 4.0, Mac Vector or Geneworks), one can analyze sequences disclosed herein for the 
selection of PCR primer pairs. The criteria for the general selection of primer pairs 
includes: 

a. The Tm of each primer is between 65-70°C 
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b. 
c. 
d. 

each pair 

5 

A. Additional criteria for: A pig-specific PCR assay 

a. Primers are selected within porcine-specific regions of the sequence — such as 
within gag, env, or U3. Porcine-specific primers are defined as sequences which overall 
have <70% homology to the corresponding region in human, mouse and primate 
10 retroviruses. In addition, the last five bases at the 3' end of the primer should be unique to 
the pig retroviral sequence. 



The Tm's for each pair differ by no more than 3°C 
The PCR fragment is between 200-800 bp in length 

There are no repeats, self complementary bases, primer-dimer issues, etc for 



b. Primers should have no more than one or two mismatched bases based on the 
miniswine, and retroviral sequences disclosed herein. These mismatched bases should not 
1 5 be within the last three or four bases of the 3* end of the primer. 

B. Additional criteria for: Miniswine-specific PCR assay 

a. Primers are selected such that there are at least one or two mismatches between 
miniswine and domestic pig sequences. At least one of these mismatches should be located 
20 within the last three or four bases at the 3' end of the primer. Preferably, these mismatches 
would be a change from either a G or C in miniswine to either an A or T in domestic pig. 
RT-PCR Strategy 

There are a number of commercially available RT-PCR Kits for routine 
amplification of fragments. Several primer pairs should beTested to confirm Tm and 

25 specificity. Location of primers within the sequence depends in part on what question is 
being answered. RT-PCR should answer questions about expression and presence of RV 
sequences. PCR will not necessarily answer the question of whether the retroviral sequence 
is full-length or encodes a replication competent retrovirus. A positive signal in these tests 
only says there is RV sequence present. Indication of the possibility of full-length viral 

30 genomes being present can be obtained by performing long PCR using primers in U5 and 
U3. A commercial kit for long RT-PCR amplification is available (Takara RNA LA PCR 
Kit). Confirmation of full-length viral genomes requires infectivity studies and/or isolation 
of viral particles. 

Northern analyses would complement RT-PCR data. Detection of bands at the 
35 predicted size of full-length viral genomes with hybridization probes from env, U3 or U5 
would provide stronger evidence. The presence of other small bands hybridizing would 
indicate the amount of defective viral fragments present. 
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Elisa-Based Assay To Detect The Presence Of Porcine Retroviral Proteins, Polypeptides Or 
Peptides 

In addition to the use of nucleic acid-based, e.g., PCR-based assays, to detect the 
presence of retroviral sequences, ELISA based assays can detect the presence of porcine 
5 retroviral proteins, polypeptides and peptides. 

The basic steps to developing an ELISA include (a) generation of porcine retroviral 
specific peptides, polypeptides and proteins; (b) generation of antibodies which are specific 
for the porcine retroviral sequences; (c) developing the assay. 

Using the retroviral sequences disclosed herein, antigenic peptides can be designed 
10 using computer based programs such as Mac Vector or Geneworks to analyse the retroviral 
sequences. Alternatively, it is possible to express the porcine retroviral sequences in gene 
expression systems and to purify the expressed polypeptides or proteins . After synthesis, 
the peptides, polypeptides or proteins are used to immunize mice or rabbits and to develop 
serum containing antibodies. 
15 Having obtained the porcine retroviral specific antibodies the ELISA can be 

developed as follows. ELISA plates are coated with a volume of polyclonal or monoclonal 
antibody (capture antibody) which is reactive with the analytc to be tested. Such analytes 
include porcine retroviruses or retroviral proteins such as env or p24. The ELISA plates are 
then incubated at 4°C overnight. The coated plates are then washed and blocked with a 
20 volume of a blocking reagent to reduce or prevent non-specific hybridization. Such 

blocking reagents include bovine serum albumin (BSA), fetal bovine serum (FBS) ? milk r or 
gelatin. The temperature for the blocking process is 37°C. Plates can be used immediately 
or stored frozen at -20°C until needed. The plates are then washed, loaded with a serial 
dilution of the analyte, incubated at 37°C, and washed again. Bound analyte is detected 
25 using a detecting antibody. Detecting antibodies include enzyme-linked, fluoresceinated, 
biotin-conjugated or other tagged polyclonal or monoclonal antibodies which are reactive 
with the analyte. If monoclonal antibodies are used the detecting antibody should recognize 
an epitope which is different from the capture antibody. 

Other Embodiments 

30 In another aspect, the invention provides a substantially pure nucleic acid having, or 

comprising, a nucleotide sequence which encodes a swine or miniature swine, e.g., a 
Tsukuba-1 retroviral gag polypeptide. 

In preferred embodiments: the nucleic acid is or includes the nucleotide sequence 
from nucleotides 2452-4839 of SEQ ID NO:l; the nucleic acid is at least 60%, 70%, 80%, 

35 90%, 95%, 98%, or 99% homologous with a nucleic acid sequence corresponding to 

nucleotides 2452-4839 of SEQ ID NO:l; or by a sequence which, hybridizes under high 
stringency conditions to nucleotides 2452-4839 of SEQ ID NO:l; the nucleic acid includes 
a fragment of SEQ ID NO:l which is at least 25, 50, 100, 200, 300, 400, 500, or 1,000 
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bases in length; the nucleic acid differs from the nucleotide sequence corresponding to 
nucleotides 2452-4839 of SEQ ID NO:l due to degeneracy in the genetic code: the nucleic 
acid differs from the nucleic acid sequence corresponding to nucleotides 2452-4839 of SEQ 
ID NO: 1 by at least one nucleotide but by less than 5. 10, 1 5 or 20 nucleotides and 
preferably which encodes an active peptide. 

In yet another preferred embodiment, the nucleic acid of the invention hybridizes 
under stringent conditions to a nucleic acid probe corresponding to at least 1 2 consecutive 
nucleotides from nucleotides 2452-4839 of SEQ ID NO:l, or more preferably to at least 20 
consecutive nucleotides from nucleotides 2452-4839 of SEQ ID NO: 1 , or more preferably 
to at least 40 consecutive nucleotides from nucleotides 2452-4839 of SEQ ID NO:l 

In another aspect, the invention features, a purified recombinant nucleic acid having 
at least 50%, 60%, 70%, 80%, 90%, 95%. 98%, or 99% homology with a nucleotide 
sequence corresponding to nucleotides 2452-4839 of SEQ ID NO:l. 

The invention also provides a probe or primer which includes or comprises a 
substantially purified oligonucleotide. The oligonucleotide includes a region of nucleotide 
sequence which hybridizes under stringent conditions to at least 1 0 consecutive nucleotides 
of sense or antisense sequence from nucleotides 2452-4839 of SEQ ID NO:l, or naturally 
occurring mutants thereof. In preferred embodiments, the probe or primer further includes 
a label attached thereto. The label can be. e.g.. a radioisotope, a fluorescent compound, an 
enzyme, and/or an enzyme co-factor. Preferably the oligonucleotide is at least 10 and less 
than 20, 30, 50, 100, or 150 nucleotides in length. Preferred primers of the invention 
include oligonucleotides having a nucleotide sequence shown in any of SEQ ID NOs:32- 



37 



The invention involves nucleic acids, e.g., RNA or DNA. encoding a polypeptide of 
the invention. This includes double stranded nucleic acids as well as coding and antisense 
single strands. 

In another aspect, the invention provides a substantially pure nucleic acid having, or 
comprising, a nucleotide sequence which encodes a swine or miniature swine, e.g., a 
Tsukuba-l retroviral pol polypeptide. 

In preferred embodiments: the nucleic acid is or includes the nucleotide sequence 
corresponding to nucleotides 4871-8060 of SEQ ID NO: 1 ; the nucleic acid is at least 60% 
70%, 80%, 90%, 95%, 98%, or 99% homologous with a nucleic acid sequence 
corresponding to nucleotides 4871-8060 of SEQ ID NO:l; or by a sequence which, 
hybridizes under high stringency conditions to nucleotides 4871 -8060 of SEQ ID NO 1 : the 
nucleic acid includes a fragment of SEQ ID NO: 1 which is at least 25. 50. 100, 200, 300 
400, 500, or 1,000 bases in length; the nucleic acid differs from the nucleotide sequence 
corresponding to nucleotides 4871-8060 of SEQ ID NO:l due to degeneracy in the genetic 
code; the nucleic acid differs from the nucleic acid sequence corresponding to nucleotides 
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487 1 -8060 of SEQ ID NO: 1 by at least one nucleotide but by less than 5, 1 0. 1 5 or 20 
nucleotides and preferably which encodes an active peptide. 

In yet another preferred embodiment, the nucleic acid of the invention hybridizes 
under stringent conditions to a nucleic acid probe corresponding to at least 12 consecutive 
5 nucleotides from nucleotides 4871-8060 of SEQ ID NO:!, or more preferably to at least 20 
consecutive nucleotides from nucleotides 4871-8060 of SEQ ID NO:l, or more preferably 
to at least 40 consecutive nucleotides from nucleotides 4871-8060 of SEQ ID NO:l. 

In another aspect, the invention features, a purified recombinant nucleic acid having 
at least 50%, 60%, 70%, 80%, 90%, 95%. 98%, or 99% homology with a nucleotide 

1 0 sequence corresponding to nucleotides 487 1 -8060 of SEQ ID NO: 1 . 

The invention also provides a probe or primer which includes or comprises a 
substantially purified oligonucleotide. The oligonucleotide includes a region of nucleotide 
sequence which hybridizes under stringent conditions to at least 10 consecutive nucleotides 
of sense or antisense sequence from nucleotides 4871-8060 of SEQ ID NO: K or naturally 

15 occurring mutants thereof In preferred embodiments, the probe or primer further includes 
a label attached thereto. The label can be, e.g., a radioisotope, a fluorescent compound, an 
enzyme, ancl/or an enzyme co-factor. Preferably the oligonucleotide is at least 10 and less 
than 20, 30, 50, 100, or 150 nucleotides in length. Preferred primers of the invention 
include oligonucleotides having a nucleotide sequence shown in any of SEQ ID NOs:38- 

20 47. 

The invention involves nucleic acids, e.g., RNA or DNA, encoding a polypeptide of 
the invention. This includes double stranded nucleic acids as well as coding and antisense 
single strands. 

In another aspect, the invention provides a substantially pure nucleic acid having, or 
25 comprising, a nucleotide sequence which encodes a swine or miniature swine, e.g., a 
Tsukuba-l retroviral env polypeptide. 

In preferred embodiments: the nucleic acid is or includes the nucleotide sequence 
corresponding to nucleotides 2-1999 of SEQ ID NO: 1 ; the nucleic acid is at least 60%, 
70%, 80%, 90%, 95%o, 98%, or 99% homologous with a nucleic acid sequence 
30 corresponding to nucleotides 2- 1 999 of SEQ ID NO: 1 ; or by a sequence which, hybridizes 
under high stringency conditions to nucleotides 2-1999 of SEQ ID NO:l ; the nucleic acid 
includes a fragment of SEQ ID NO:l which is at least 25, 50, 100, 200, 300, 400, 500, or 
1,000 bases in length; the nucleic acid differs from the nucleotide sequence corresponding 
to nucleotides 2-1999 of SEQ ID NO: 1 due to degeneracy in the genetic code; the nucleic 
35 acid differs from the nucleic acid sequence corresponding to nucleotides 2-1999 of SEQ ID 
NO: 1 by at least one nucleotide but by less than 5, 10, 15 or 20 nucleotides and preferably 
which encodes an active peptide. 
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In yet another preferred embodiment, the nucleic acid of the invention hybridizes 
under stringent conditions to a nucleic acid probe corresponding to at least 1 2 consecutive 
nucleotides from nucleotides 2-1999 of SEQ ID NO:l, or more preferably to at least 20 
consecutive nucleotides from nucleotides 2-1 999 of SEQ ID NO: 1 , or more preferably to at 
5 least 40 consecutive nucleotides from nucleotides 2- 1 999 of SEQ ID NO: 1 . 

In another aspect, the invention features, a purified recombinant nucleic acid having 
at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with a nucleotide 
sequence corresponding to nucleotides 2-1999 of SEQ ID NO:l. 

The invention also provides a probe or primer which includes or comprises a 
10 substantially purified oligonucleotide. The oligonucleotide includes a region of nucleotide 
sequence which hybridizes under stringent conditions to at least 10 consecutive nucleotides 
of sense or antisense sequence from nucleotides 2-1999 of SEQ ID NO:l, or naturally 
occurring mutants thereof In preferred embodiments, the probe or primer further includes 
a label attached thereto. The label can be, e.g., a radioisotope, a fluorescent compound, 
15 an enzyme, and/or an enzyme co-factor. Preferably the oligonucleotide is at least 10 and 
less than 20, 30, 50, 100, or 150 nucleotides in length. Preferred primers of the invention 
include oligonucleotides having a nucleotide sequence shown in any of SEQ ID NOs:6-3 1 . 

The invention includes nucleic acids, e.g., RNA or DNA, encoding a polypeptide of 
the invention. This includes double stranded nucleic acids as well as coding and antisense 
20 single strands. 

Included in the invention are: allelic variations, natural mutants, induced mutants, 
that hybridize under high or low stringency conditions to the nucleic acid of SEQ ID NO: 1 , 
2, or 3 (for definitions of high and low stringency see Current Protocols in Molecular 
Biology, John Wiley & Sons, New York, 1989. 6.3.1 - 6.3.67hereby incorporated by 
25 reference). 

The invention also includes purified preparations of swine or miniature swine 
retroviral polypeptides, e.g., gag pol, or env polypeptides, or fragments thereof, preferably 
biologically active fragments, or analogs, of such polypeptides. In preferred embodiments: 
the polypeptides are miniature swine retroviruses polypeptides; the polypeptides are 

30 Tsukuba polypeptides; the polypeptides are gag, pol, or env polypeptides encoded by SEQ 
ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its 
complement, or naturally occuring variants thereof 

A biologically active fragment or analog is one having any in vivo or in vitro 
activity which is characteristic of thcTsukuba-1 polypeptides described herein, or of other 

35 naturally occurring Tsukuba- 1 polypeptides. Fragments include those expressed in native 
or endogenous cells, e.g., as a result of post-translational processing, e.g., as the result of 
the removal of an ami no -terminal signal sequence, as well as those made in expression 
systems, e.g., in CHO cells. A useful polypeptide fragment or polypeptide analog is one 
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which exhibits a biological activity in any biological assay for Tusukuba- 1 polypeptide 
activity. Most preferably the fragment or analog possesses 10%, preferably 40%. or at least 
90% of the activity of Tsukuba-1 polypeptides, in any in vivo or in vitro Tsukuba-1 
polypeptide assay. 

5 In order to obtain a such polypeptides, polypeptide-encoding DNA can be 

introduced into an expression vector, the vector introduced into a cell suitable for 
expression of the desired protein, and the peptide recovered and purified, by prior art 
methods. Antibodies to the polypeptides can be made by immunizing an animal, e.g., a 
rabbit or mouse, and recovering antibodies by prior art methods. 
10 The invention also features a purified nucleic acid, which has least 60%, 70%. 72%, 

more preferably at least 85%. more preferably at least 90%, more preferably at least 95%, 
most preferably at least 98%, 99% or 100% sequence identity or homology with SEQ ID 
NO: 1 or its complement, SEQ ID NO: 2 or its complement, or SEQ ID NO: 3 or its 
complement. 

15 In preferred embodiments the nucleic acid is other than the entire retroviral genome 

of SEQ ID NO:l or its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or 
its complement, e.g., it is at least 1 nucleotide longer, or at least 1 nucleotide shorter, or 
differs in sequence at at least one position. E.g., the nucleic acid is a fragment of the 
sequence of SEQ ID NO: 1 or its complement, SEQ ID NO:2 or its complement, or SEQ ID 

20 NO:3 or its complement, or it includes sequence additional to that of SEQ ID NO:l, or its 
complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement. 

In preferred embodiments: the sequence of the nucleic acid differs from the 
corresponding sequence of SEQ ID NO: 1 or its complement, SEQ ID NO:2 or its 
complement, or SEQ ID NO:3 or its complement, by 1. 2^3, 4. or 5 base pairs; the 

25 sequence of the nucleic acid differs from the corresponding sequence of SEQ ID NO: 1 or 
its complement, SEQ ID NO:2 or its complement, or SEQ ID NO:3 or its complement, by 
at least 1, 2, 3, 4, or 5 base pairs but less than 6, 7, 8, 9, or 10 base pairs. 

In other preferred embodiments: the nucleic acid is at least 10, more preferably at 
least 15, more preferably at least 20, most preferably at least 25, 30, 50, 100, 1000, 2000, 

30 4000, 6000, or 8060 nucleotides in length; the nucleic acid is less than 15. more preferably 
less than 20, most preferably less than 25, 30, 50, 100, 1000, 2000, 4000. 6000, or 8060 
nucleotides in length. 
Equivalents 

Those skilled in the art will be able to recognize, or be able to ascertain using no 
35 more than routine experimentation, numerous equivalents to the specific procedures 

described herein. Such equivalents are considered to be within the scope of this invention 
and are covered by the following claims. 
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SEQUENCE LISTING 

{ 1 ) GENERAL I N FORMAT I ON : 

(i) APPLICANT: Jay A. Fishman 

<ii> TITLE OF INVENTION: MOLECULAR SEQUENCE OF SWINE RETROVIRUS 

AND METHODS OF USE 

(iii) NUMBER OF SEQUENCES: 7 4 

Civ) CORRESPONDENCE ADDRESS : 

(A) ADDRESSEE: LAHIVE & COCKFIELD , LLP 

(B) STREET: 60 State Street 

(C) CITY: Boston 

(D) STATE: Massachusetts 
(E> COUNTRY: USA 

(F) 2IP: 02109-1875 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC - DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

<vi) CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: 
<B) FILING DATE: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER : US 08/572,645 

(B) FILING DATE: 14-DEC-1995 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Louis Myers ~' 

(B) REGISTRATION NUMBER; 3 5,96 5 

<C) REFERENCE /DOCKET NUMBER: MGP-0 3 8CP 

(ix> TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (617)227-7400 

(B) TELEFAX: (617)227-5941 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 806 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
CTCGAGACTC GGTGGAAGGG CCCTTATCTC GTACTTTTGA CCACACCAAC GGCTGTGAAA 6< 
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GTCGAAGGAA TCTCCACCTG GATCCATGCA TCCCACGTTA AGCCGGCGCC ACCTCCCGAT 12 0 
TCGGGGTGGA AAGCCGAAAA GACTGAAAAT CCCCTTAAGC TTCGCCTCCA TCGCGTGGTT 1 8 C 
CCTTACTCTG TCAATAACCT CTCAGACTAA TGGTATGCGC ATAGGAGACA GCCTGAACTC 24 0 
CCATAAACCC TTATCTCTCA CCTGGTTAAT TACTGACTCC GGCACAGGTA TTAATATCAA 3 00 
CAACACTCAA GGGGAGGCTC CTTTAGGAAC CTGGTGGCCT GATCTATACG TTTGCCTCAG 36 0 
ATCAGTTATT CCTAGTCTGA CCTCACCCCC AGATATCCTC CATGCTCACG GATTTTATGT 42 0 
TTGCCCAGGA CCACCAAATA ATGGAAAACA TTGCGGAAAT CCCAGAGATT TCTTTTGTAA 480 
ACAATGGAAC TGTGTAACCT CTAATGATGG ATATTGGAAA TGGCCAACCT CTCAGCAGGA 54 0 
TAGGGTAAGT TTTTCTTATG TCAACACCTA TACCAGCTCT GGACAATTTA ATT AC CTG AC 6 00 
CTGGATTAGA ACTGGAAGCC CCAAGTGCTC TCCTTCAGAC CTAGATTACC TAAAAATAAG 66 0 
TTTCACTGAG AAAGGAAAAC AAGAAAATAT CCTAAAATGG GTAAATGGTA TGTCTTGGGG 72 0 
AATGGTATAT TATGGAGGCT CGGGTAAACA ACCAGGCTCC ATTCTAACTA TTCGCCTCAA 78 0 
AATAAACCAG CTGGAGCCTC CAATGGCTAT AGGACCAAAT ACGGTCTTGA CGGGTCAAAG 84 0 
ACCCCCAACC CAAGGACCAG GACCATCCTC TAACATAACT TCTGGATCAG ACCCCACTGA 9 00 
GTCTAGCAGC ACGACTAAAA TGGGGGCAAA ACTTTTTAGC CTCATCCAGG GAGCTTTTCA 96 0 
AGCTCTTAAC TCCACGACTC CAGAGGCTAC CTCTTCTTGT TGGCTATGCT TAGCTTTGGG 102 0 
CCCACCTTAC TATGAAGGAA TGG CTAGAAG AGGGAAATTC AATGTGACAA AAGAACATAG 1080 
AGACCAATGC ACATGGGGAT CCCAAAATAA GCTTACCCTT ACTGAGGTTT CTGGAAAAGG 114 0 
C AC CTG CAT A GGAAAGGTTC CCCCATCCCA CCAACACCTT TGTAACCACA CTGAAGCCTT 12 OG 
TAATCAAACC TCTGAAAGTC AATATCTGGT ACCTGG TTAT GACAGGTGGT GGGC ATG T AA 12 6 0 
T AC TGG ATT A ACCCCTTGTG TTTCCACCTT GGTTTTTAAC CAAACTAAAG ATTTTTG CAT 13 2 0 
TATGGTCCAA ATTGTTCCCC GAGTGTATTA CTATCCCGAA AAAGCAATCC TTGATGAATA 13 8 0 
TGACTACAGA AATCATCGAC AAAAGAGAGA ACCCATATCT CTGACACTTG CTG TG ATG C T 14 4 0 
CGGACTTGGA GTGGCAGCAG GTGTAGGAAC AGGAACAGCT GCCCTGGTCA CGGGACCACA 1500 
G C AG C T AG AA ACAGGACTTA G T AA CC T AC A TCGAATTGTA ACAGAAGATC TCCAAGCCCT 156 0 
AGAAAAATCT GTCAGTAACC TGGAGGAATC CCTAACCTCC TTAT CTG AAC TAGTCCTACA 16 2 0 
GAATAGAAGA GGGTTAGATT TATTATTTCT AAAAGAAGGA GGATTATGTG TAGCCTTGAA 16 8 0 
GGAGG AATGC TGTTTTTATG TGGATCATTC AGGGGCCATC AGAGACTCCA TGAACAAACT 174 0 
TAGAGAAAGG TTGGAGAAGC GTCGAAGGGA AAAGGAAACT ACTCAAGGGT GGTTTGAGGG 18 00 
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ATGGTTCAAC AGGTCTCCTT GGTTGGCTAC CCTACTTTCT GCTTTAACAG GACCCTTAAT 186 0 
AGTCCTCCTC CTGTTACTCA CAGTTGGGCC ATGTATTATT AACAAGTTAA TTGCCTTCAT 192 0 
TAGAGAACGA ATAAGTGCAG TC C AG AT C AT GG TACTT AG A CAACAGTACC AAAGCCCGTC 198 0 
TAG C AGGG AA GCTGGCCGCT AGCTCTACCA GTTCTAAGAT TAGAACTATT AACAAGAGAA 2 04 0 
GAAGTGGGGA ATGAAAGGAT GAAAATACAA CCTAAGCTAA TGAGAAGCTT AAAATTGTTC 2100 
TGAATTCCAG AGTTTGTTCC TTATAGGTAA AAGATTAGGT TTTTTGCTGT TTTAAAATAT 216 0 
GCGGAAGTAA AATAGGCCCT GAGTACATGT CTCTAGGCAT GAAACTTCTT GAAACTATTT 22 2 0 
GAGATAACAA GAAAAGGGAG TTTCTAACTG CTTGTTTAGC TTCTGTAAAA CTGGTTGCGC 22 8 0 
C ATAAAG AT G TTGAAATGTT GATACACATA TCTTGGTGAC AACATGT CTC CCCCACCCCG 2340 
AAACATGCGC AAATGTGTAA CT CTAAAAC A ATTTAAATTA ATTGGTC CAC GAAGCGCGGG 24 00 
CTCTCGAAGT TTTAAATTGA CTGGTTTGTG ATATTTTGAA ATGATTGGTT TGTAAAGCGC 24 6 0 
GGGCTTTGCT GTGAACCCCA TAAAAGCTGT CCCGACTCCA CACTCGGGGC CGCAGTCCTC 25 2 0 
TACCCCTGCG TGGTGTACGA CTGTGGGCCC CAGCGCGCTT GGAATAAAAA TCCTCTTGCT 258 0 
G TTTG CATC A AGACCGCTTC TCGTGAGTGA TTAAGGGGAG TCGCCTTTTC CGAGCCTGGA 264 0 
GGTTCTTTTT GCTGGTCTTA CATTTGGGGG CTCGTCCGGG ATCTGTCGCG GCCACCCCTA 27 0 0 
ACACCCGAGA ACCGACTTGG AG G T AAAAAG GATC CTCTTT TTAACGTGTA TGCATGTACC 276 0 
GGCCGGCGTC TCTGTTCTGA GTGTCTGTTT TCAGTGGTGC GCGCTTTCGG TTTGCAGCTG 28 2 0 
TC C T CT C AGG CCGTAAGGGC TGGGGGACTG TG AT C AG C AG ACGTGCTAGG AGGATCACAG 26 8 0 
GCTGCTGCCC TGGGGGACGC CCCGGGAGGT GAGGAGAGCC AGGGACGCCT GGTGGTCTCC 2S4 0 
TACTGTCGGT CAGAGGACCG AATTCTGTTG CTGAAGCGAA AGCTTCCCCC TCCGCGACCG 3 00 0 
TCCGACTCTT TTGCCTGCTT GTGGAATACG TGGACGGGTC ACGTGTGTCT GGATCTGTTG 3 06 0 
GTTTCTGTTT TGTGTGTCTT TGTCTTGTGT GTC CTTGTCT ACAGTTTTAA TATGGGACAG 312 0 
ACGGTGACGA CCCCTCTTAG TTTGACTCTC GACCATTGGA CTGAAGTTAA ATCCAGGGCT 318 0 
CATAATTTGT CAGTTCAGGT TAAGAAGGGA CCTTGGCAGA CTTTCTGTGT CTCTGAATGG 3 24 0 
CCGACATTCG ATGTTGGATG GCCATCAGAG GGGACCTTTA ATTCTGAGAT TATCCTGGCT 33 0 0 
GTTAAAGCAA TTATTTTTCA GACTGGACCC GGCTCTCATC CCGATCAGGA GCCCTATATC 3 36 0 
CTTACGTGGC AAGATTTGGC AGAGGATCCT CCGCCATGGG TTAAACCATG GCTGAATAAG 34 2 0 
CCAAGAAAGC CAGGTCCCCG AATTCTGGCT CTTGGAGAGA AAAACAAACA CTCGGCTGAA 34 8 0 
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AAAGTCAAGC CCTCTCCTCA TATCTACCCC GAGATTGAGG AACCACCGGC TTGGCCGGAA 3 54 0 

CCCCAATCTG TTCCCCCACC CCCTTATCTG GCACAGGGTG CCGCGAGGGG ACCCTTTGCC 36 0 0 

5 CCTCCTGGAG CTCCGGCGGT GGAGGGACCT TCTG CAGGGA CTCGGAGCCG GAGGGGCGCC 36 6 0 

ACCCCGGAGC GGACAGACGA GATCGCGACA TTAC CGCTGC GCACGTACGG CCCTCCCACA 3720 

CCGGGGGGCC AATTGCAGCC CCTCCAGTAT TGGCCCTTTT CTTCTGCAGA TCTCTATAAT 3 78 0 

10 

TGGAAAACTA ACCATCCCCC TTTCTCGGAG GATCCCCAAC GCCTCACGGG GTTGGTGGAG 384 0 

TCCCTTATGT TCTCTCACCA GCCTACTTGG GATGATTGTC AACAGCTGCT GCAGACACTC 3 90 3 

15 TTCACAACCG AG G AG CG AG A GAGAATTCTA TTAGAGGCTA GAAAAAATGT TCCTGGGGCC 3 96 0 

GACGGGCGAC CCACGCGGTT GCAAAATGAG ATTGACATGG GATTTCCCTT AACTCGCCCC 4 02 0 

GGTTGGGACT ACAACACGGC TGAAGGTAGG GAGAGCTTGA AAATCTATCG CCAGGCTCTG 40 8 0 

20 

GTGGCGGGTC TCCGGGGCGC CTCAAGACGG CCCACTAATT TGGCTAAGGT AAGAGAAGTG 414 0 

ATGCAGGGAC CGAATGAACC CCCCTCTGTT TTTCTTGAGA GGCTCTTGGA AGCCTTCAGG 4200 

25 CGGTACACCC CTTTTGATCC CACCTCAGAG GCCCAAAAAG CCTCAGTGGC TTTGGC CTTT 42 6 0 

ATAGGACAGT CAGCCTTGGA TATTAGAAAG AAGCTTCAGA GACTGGAAGG GTTACAGGAG 43 2 0 

GCTGAGTTAC GTGATCTAGT GAAGGAGGCA G AG AAAG TAT ATTACAAAAG GGAGACAGAA 43 8 0 

30 

GAAGAAAGGG AAC AAAG AAA AG AG AG AG AA AGAGAGGAAA GG G AG G AAAG ACGTAATAAA 44 4 0 

CGGCAAGAGA AGAATTTGAC TAAGATCTTG GCTGCAGTGG TTGAAGGGAA AAGCAATACG 4 500 

35 GAAAGAGAGA GAGATTTTAG GAAAATTAGG TCAGGCCCTA GACAGTCAGG GAACCTGGGC 4 56 0 

AATAGGACCC CACTCGACAA GGACCAATGT GCATATTGTA AAGAAAGAGG ACACTGGGCA 462 0 

AGGAACTGCC CCAAGAAGGG AAACAAAGGA CCAAGGATCC TAGCTCTAGA AG AAG AT AAA 468 0 

40 

GATTAGGGGA GACGGGGTTC GGACCCCCTC CCCGAGCCCA GGGTAACTTT GAAGGTGGAG 4 74 0 

GGGCAACCAG TTGAGTTCCT GGTTGATACC GGAGCGAAAC ATTCAGTGCT ACT ACAGC C A 4 80 0 

45 TTAGGAAAAC TAAAAGATAA AAAATCCTGG GTGATGGGTG CACAGGGCAA C AAC AG TAT C 4 86 0 

CATGGACTAC CCG AAGACAG TTGACTTGGG AGTGGGACGG GTAACCCACT CGTTTCTGGT 4 92 0 

CATACCTGAG TGCCCAGCAC CCCTCTTAGG TAGAGACTTA TTG AC C AAG A TGGGAGCACA 4 98 0 

50 

AATTTCTTTT GAACAAGGGA AAC C AG AAG T CTCTGCAAAT AACAAACCTA TCACTGTGTT 504 0 

GACCCTCCAA TTAGATGACG AATATCGACT ATACTCTCCC CTAGTAAAGC CTGATCAAAA 5100 

55 TATACAATTC TGGTTGGAAC AGTTTCCCCA AGCCTGGGCA GAAACCGCAG GGATGGGTTT 516 0 

GGCAAAGCAA GTTCCCCCAC AAGTTATTCA ACTGAAGGCC AGTGCCACAC CAGTGTCAGT 522 0 



15 



20 



25 
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CAGACAGTAC CCCTTGAGTA AAGAAGCTCA AGAAGGAATT CGGCCGCATG TCCAAAGATT 52 80 
AATCCAACAG GGCATCCTAG TTCCTGTCCA ATCTCCCTGG AATACTCCCC TGCTACCGGT 53 4 0 
TAGAAAGCCT GGGACTAATG ACTATCGACC AGTACAGGAC TTGAGAGAGG TCAATAAACG 54 00 
GGTGCAGGAT ATACACCCAA CAGTCCCGAA CCCTTATAAC CTCTTGTGTG CTCTCCCACC 54 6 0 
10 CCAACGGAGC TGGTATACAG T ATTGG ACT T AAAGGATGCC TTCTTCTGCC TGAGATTACA 552 0 
CCCCACTAGC CAACCACTTT TTGCCTTCGA ATGGAGAGAT CCAGGTACGG GAAGAACCGG 55 8 0 
GCAGCTCACC TGGACCCGAC TGCCCCAAGG GTTCAAGAAC TCCCCGACCA TCTTTGACGA 564 0 
AGCCCTACAC AG AG AC CTGG CCAACTTCAG GATCCAACAC CCTCAGGTGA CCCTCCTCCA 5 70 0 
GTACGTGGAT G AC CTGCTTC TGGCGGGAGC CACCAAACAG GACTGCTTAG AAGGCACGAA 5 76 0 
GGCACTACTG CTGGAATTGT CTGACCTAGG CTACAGAGCC TCTGCTAAGA AGGCCCAGAT 5 82 0 
TTGCAGGAGA GAGGTAACAT ACTTGGGGTA CAGTTTACGG GACGGGCAGC GATGGCTGAC 5880 
GGAGGCACGG AAGAAAACTG TAGTCCAGAT ACCGGCCCCA ACCACAGCCA AACAAATGAG 5 94 0 
AGAGTTTTTG GGGACAGCTG GATTTTGCAG ACTGTGGATC CCGGGGTTTG CGACCTTAGC 6 000 
AGCCCCACTC TACCCGCTAA CCAAAGAAAA AGGGGAATTC TCCTGGGCTC CTGAGCACCA 6 06 0 
30 GAAGGCATTT GATGCTATCA AAAAGGCCCT GCTGAGCGCA CCTGCTCTGG CCCTCCCTGA 612 0 
CGTAACTAAA CCCTTTACCC TTTATGTGGA TGAGCGTAAG GGAGTAGCCC GGGGAGTTTT 6180 
AACCCAAACC CTAGGACCAT GGAGAAGACC TGTCGCCTAC CTGTCAAAGA AGCTCGATCC 624 0 

35 

TGTAGCCAGT GGTTGGCCCA TATGCCTGAA GGCTATCGCA GCTGTGGCCA TACTGGTCAA 6 3 00 
GGACGCTGAC AAATTGACTT TGGGACAAGA ATATAACTGT AATAGCCCCC CATGC ATTGG 636 0 
40 AGAACATCGT TCGGCAGCCC CCAGACCGAT GGATGACCAA CGCCCGCATG ACCCACTATC 64 2 0 
AAAGCCTGCT TCTCACAGAG AGGGTCACGT TCGCTCCACC AACCGCTCTC AACCCTGCCA 64 8 0 
CTCTTCTGCC TGAAGAGACT GATGAACCAG TGACTCATGA TTGCCATCAA CTATTGATTG 6M0 

45 

AGGAGACTGG GGTCCGCAAG GACCTTACAG ACATACCGCT GACTGGAGAA GTGCTAACCT 66 0 0 
GGTTCACTGA CGGAAGCAGC TATGTGGTGG AAGGTAAGAG GATGGCTGGG GCGGCGGTGG 666 0 
:>0 TGGACGGGAC CCGCACGATC TGGGCCAGCA GCCTGCCGGG AGGAACTTCA GCACAAAAGG 6 72 0 
CTGAGCTCAT GGCCCTCACG CAAGCTTTGC GGCTGGCCGA AGGGAAATCC ATAAACATTT 6 78 0 
ATACGGACAG CAGGTATGCC TTTGCGACTG CACACGTACA TGGGGCCATC TATAAACAAA 6 84 0 
GGGGGTTGCT TACCTCAGCA GGGAGGGAAA TAAAGAACAA AGAGGAAATT CTAAGCCTAT 6 90 0 
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TAGAAGCCGT ACATTTACCA AAAAGGCTAG CTATTATACA CTGTCCTGGA CATCAGAAAG 6 96 0 

CTAAAGATCT CATATCCAGA GGAAACCAGA TGGCTGACCG GGTTGCCAAG CAGGCAGCCC 7 02 0 

AGGGTGTTAA CCTTCTGCCT AT AAT AG AAA TGCCCAAAGC CCCAGAACCC AGACGACAGT 708 0 

ACACCCTAGA AGACTGGCAA GAGATAAAAA AGATAGACCA TTCTCTGAGA CTCCGGAAGG 714 0 

GACCTGCTAT ACCTCAGATG GGAAGGAAAT CCTGCCCCAC AAAGAAGGGT TAGAATATGT 72 0 0 

CCAACAAGAT ACATCGTCTA ACCCACCTAG GAACTAAACA CCTGCAGCAG TTGGTCAGAA 72 6 0 

CATCCCCTTA TCATGTTCTG AGGCTACCAG GAGTGGCTGA CTCGGTGGTC AAACATTGTG 73 2 0 

TGCCCTGCCA GCTGGTTAAT GCTAATCCTT CCAGAATGCC TCCAGGGAAG AGACTAAGGG 73 8 0 

GAAGCCACCC AGGCGCTCAC TGGGAAGTGG ACTTCACTGA GGTAAAGCCG GCTAAATATG 744 0 

GAAACAAATA CCTATTGGTT TTTGTAGACA CCTTTTCAGG ATGGGTAGAG GCTTATCCTA 750 0 

CTAAGAAAGA GACTTCAACC GTGGTAGCTA AAAAAATACT GG AAGAAATT TTTCCAAGAT 7 56 0 

TTGGAATACC TAAGGTAATA GGGTCAGACA ATGGTCCAGC TTTTGTTGCC CAGGTAAGTC 762 0 

AGGGACTGGC C AA GAT ATT G GGGATTGATT GGAAACTG C A TTGTGCATAC AGACCCCAAA 76 8 0 

GCTCAGGACA GG TAG AG AG G AT G AAT AG AA CCATTAAAGA GACCCTTACT AAATTGACCG 774 0 

CGGAGACTGG CGTTAATGAT TGGATAGCTC TCCTGCCCTT TGTGCTTTTT AGGGTTAGGA 78 0 0 

ACACCCCTGG ACAGTTTGGG CTGACCCCCT ATGAATTACT CTACGGGGGA CCCCCCCCAT 78 6 0 

TGGTAGAAAT TGCTTCTGTA CATAGTG CTG ATGTGCTGCT TTCCCAGCCT TTGTTCTCTA 792 0 

GGCTCAAGGC ACTTGAGTGG GTGAGACAAC GAGCGTGGAG GCAACTCCGG GAG GCC TACT 7 98 0 

CAGGAGGAGG AGACTTGCAG ATCCCACATC GTTTCCAAGT GGGAGATTCA GTCTACGTTA 8 04 0 

GACGCCACCG TGCAGGAAAC 806 0 
(2) INFORMATION FOR SEQ ID NO : 2 : 

(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7333 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNES S : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
CTACCCCTGC GTGGTGTACG ACTGTGGGCC CCAGCGCGCT TGGAATAAAA ATCCTCTTGC 6 0 
TGTTTGCATC AAGACCGCTT CTTGTGAGTG ATTTGGGGTG TCGCCTCTTC CGAGCCCGGA 12 0 



CGAGGGGGAT TGTTCTTTTA CTGGCCTTTC ATTTGGTGCG TTGGCCGGGA AATCCTGCGA 18 0 
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CCACCCCTTA CACCCGAGAA CCGACTTGGA GGTAAAGGGA TCCCCTTTGG AACATATGTG 24 0 
TGTGTCGGCC GGCGTCTCTG TTCTGAGTGT CTGTTTTCGG TGATGCGCGC TTTCGGTTTG 3 00 
CAGCTGTCCT CTCAGACCGT AAGGACTGGA GGACTGTGAT CAGCAGACGT GCTAGGAGGA 36 0 
TCACAGGCTG CCACCCTGGG GGACGCCCCG GGAGGTGGGG AGAGCCAGGG ACGCCTGGTG 4 2 0 
GTCTCCTACT GTCGGTCAGA GGACCGAGTT CTGTTGTTGA AGCGAAAGCT TCCCCCTCCG 4 SO 
CGGCCGTCCG ACTCTTTTGC CTG CTTGTGG AAGACGCGGA CGGGTCGCGT GTGTCTGGAT S4 0 
CTGTTGGTTT CTGTTTCGTG TGTCTTTGTC TTGTGCGTCC TTGTCTACAG TTTTAATATG 6 00 
GGACAGACAG TGACTACCCC CCTTAGTTTG ACTCTCGACC ATTGGACTGA AGTTAGATCC 660 
AGGGCTCATA ATTTGTCAGT TCAGGTTAAG AAGGGACCTT GGCAGACTTT CTGTGCCTCT 7 20 
GAATGGCCAA CATTCGATGT TGGATGGCCA TCAGAGGGGA CCTTTAATTC TGAAATTATC 780 
CTGGCTGTTA AGGCAATCAT TTTTCAGACT GGACCCGGCT CTCATCCTGA TCAGGAGCCC 84 0 
TATATCCTTA CGTGGCAAGA TTTGGCAGAA GATCCTCCGC CATGGGTTAA ACCATGGCTA 90 0 
AATAAACCAA GAAAGCCAGG TCCCCGAATC CTGGCTCTTG GAGAGAAAAA CAAACACTCG 96 0 
GCCGAAAAAG TCGAGCCCTC TCCTCGTATC TACCCCGAGA TCGAGGAGCC GCCGACTTGG 102 0 
CCGGAACCCC AAC CTGTTCC CCCACCCCCT TATCCAGCAC AGGGTGCTGT GAGGGGACCC 108 0 
TCTGCCCCTC CTGGAGCTCC GGTGGTGGAG GGACCTGCTG CCGGGACTCG GAGCCGGAGA 114 0 
GGCGCCACCC CGGAGCGGAC AGACGAGATC GCGATATTAC CGCTGCGCAC CTATGGCCCT 12 00 
CCCATGCCAG GGGGCCAATT GCAGCCCCTC CAGTATTGGC CCTTTTCTTC TGCAGATCTC 12 6 0 
TATAATTGGA AAACTAACCA TCCCCCTTTC TCGGAGGATC CCCAACGCCT CACGGGGTTG 13 2 0 
GTGGAGTCCC TTATGTTCTC TCACCAGCCT ACTTGGGATG ATTGTCAACA GCTGCTGCAG 13 8 0 
ACACTCTTCA CAACCGAGGA GCGAGAGAGA ATT CTGTTAG AGGCTAAAAA AAATGTTCCT 14 4 0 
GGGGCCGACG GGCGACCCAC GCAGTTGCAA AATGAGATTG ACATGGGATT TCCCTTGACT 15 00 
CGCCCCGGTT GGGACTACAA CACGGCTGAA GGTAGGGAGA GCTTGAAAAT CTATCGCCAG 156 0 
GCTCTGGTGG CGGGTCTCCG GGGCGCCTCA AGACGGCCCA CTAATTTGGC TAAGGTAAGA 16 2 3 
GAGGTGATGC AGGGACCGAA CGAACCTCCC TCGGTATTTC TTGAGAGGCT CATGGAAGCC 16 8 0 
TTCAGGCGGT TCACCCCTTT TGATCCTACC TCAGAGGCCC AGAAAGCCTC AGTGGCCCTG 174 0 
GCCTTCATTG GGCAGTCGGC TCTGGATATC AGGAAGAAAC TTCAGAGACT GGAAGGGTTA 18 0 0 
C AGGAGG CTG AGTTACGTGA TCTAGTGAGA GAGGCAGAGA AGGTGTATTA CAGAAGGGAG 1 8S0 
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ACAGAAGAGG AGAAGGAACA GAGAAAAGAA AAGGAGAGAG AAGAAAGGGA GGAAAGACGT 192 0 

GATAGACGGC AAGAGAAGAA TTTGACTAAG ATCTTGGCCG CAGTGGTTGA AGGGAAGAGC 198 0 

5 AGCAGGGAGA GAGAGAGAGA TTTTAGGAAA ATTAGGTCAG GCCCTAGACA GTCAGGGAAC 204 0 

CTGGGCAATA GGACCCCACT CGACAAGGAC CAGTGTGCGT ATTGTAAAGA AAAAGGACAC 2100 

TGGGCAAGGA ACTGCCCCAA GAAGGGAAAC AAAGGACCGA AGGTCCTAGC TCTAGAAGAA 2 3 6 0 

10 

GATAAAGATT AGGGGAGACG GGGTTCGGAC CCCCTCCCCG AGCCCAGGGT AACTTTGAAG 22 2 0 

GTGGAGGGGC AACCAGTTGA GTTCCTGGTT GATACCGGAG CGGAGCATTC AGTGCTGCTA 2280 

15 CAACCATTAG GAAAACTAAA AGAAAAAAAA TCCTGGGTGA TGGGTGCCAC AGGGCAACGG 2 34 0 

CAGTATCCAT GGACTACCCG AAGAACCGTT GACTTGGGAG TGGGACGGGT AACCCACTCG 2 4 00 

TTTCTGGTCA TCCCTGAGTG CCCAGTACCC CTTCTAGGTA GAGACTTACT GACCAAGATG 24 6 0 

20 

GGAGCTCAAA TTTCTTTTGA ACAAGGAAGA CCAGAAGTGT CTGTGAATAA CAAACCCATC 2 5 20 

ACTGTGTTGA CCCTCCAATT AGATGATGAA TATCGACTAT ATTCTCCCCA AGTAAAGCCT 2 58 0 

25 GATCAAGATA TACAGTCCTG GTTGGAGCAG TTTCCCCAAG CCTGGGCAGA AACCGCAGGG 2 64 0 

ATGGGTTTGG CAAAGCAAGT TCCCCCACAG GTTATTCAAC TGAAGGCCAG TGCTACACCA 2 7 00 

GTATCAGTCA G AC AG T AC C C CTTGAGTAGA GAGGCTCGAG AAGGAATTTG GCCGCATGTT 2 7 60 

30 

CAAAGATTAA TCCAACAGGG CATCCTAGTT CCTGTCCAAT CCCCTTGGAA TACTCCCCTG 2 82 0 

CTACCGGTTA GGAAGCCTGG G AC C AATG AT TATCGACCAG TACAGGACTT GAGAGAGGTC 28 8 0 

35 AATAAAAGGG TGCAGGACAT ACACCCAACG GTCCCGAACC CTTATAACCT CTTGAGCGCC 2 94 0 

CTCCCGCCTG AACGGAACTG GTACACAGTA TTGGACTTAA AAGATGCCTT CTTCTGCCTG 3000 

AG AT T AC AC C CCACTAGCCA ACCACTTTTT ACCTTCGAAT GGAGAGATCC AGGTACGGGA 3 06 0 

40 

AGAACCGGGC AGCTCACCTG GACCCGACTG CCCCAAGGGT TCAAGAACTC CCCGACCATC 3120 

TTTGACGAAG CCCTACACAG GGACCTGGCC AACTTCAGGA TCCAACACCC TCAGGTGACC 318 0 

45 CTCCTCCAGT ACGTGGATGA CCTGCTTCTG GCGGGAGCCA CCAAACAGGA CTGCTTAGAA 324 0 

GGTACGAAGG CACTACTGCT GGAATTGTCT GACCTAGGCT ACAGAGCCTC TGCTAAGAAG 33 00 

GCCCAGATTT GCAGGAGAGA GGTAACATAC TTGGGGTACA GTTTGCGGGG CGGGCAGCGA 3 360 

50 

TGGCTGACGG AGGCACGGAA GAAAACTGTA GTCCAGATAC CGGCCCCAAC CACAGCCAAA 34 2 0 

CAAGTGAGAG AGTTTTTGGG GACAGCTGGA TTTTG C AG AC TGTGGATCCC GGGGTTTGCG 34 8 0 

55 ACCTTAGCAG CCCCACTCTA CCCGCTAACC AAAGAAAAAG GGGGTTGCTT ACCTCAGCAG 3 54 0 

GGAGGGAAAT AAAGAACAAA GAGGAAATTC TAAGCCTATT AGAAGCCTTA CATTTGCCAA 36 OC 
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AAAGGCTAGC TATTATACAC TGTCCTGGAC 

GGAACCAGAT GGCTGACCGG GTTGCCAAGC 

5 

TAATAGAAAC GCCCAAAGCC CCAGAACCCA 



AGATAAAAAA GATAGACCAG TTCTCTGAGA 
10 GGAAGGAAAT CCTGCCCCAC AAAGAAGGGT 



CCCACCTAGG AACTAAACAC CTGCAGCAGT 



GGCTACCAGG AGTGGCTGAC TCGGTGGTCA 

15 

CTAATCCTTC CAGAATACCT CCAGGAAAGA 



GGGAAGTGGA CTTCACTGAG GTAAAGCCGG 
20 TTGTAGACAC CTTTTCAGGA TGGGTAGAGG 



TGGTGGCTAA GAAAATACTG GAGGAAATTT 



GGTCAGACAA TGGTCCAGCT TTCGTTGCCC 

25 

GGATTGATTG AAAAC TG CAT TGTGCATACA 



TGAATAGAAC CATTAAAGAG ACCCTTACCA 
30 GGATGGCTCT CCTGCCCTTT GTGCTTTTTA 



TGACCCCCTA TAAATTGCTC TACGGGGGAC 



ATAGTGCTGA TGTGCTGCTT TCCCAGCCTT 

35 

TGAGGCAGCG AGCGTGGAAG CAGCTCCGGG 



CACATCGCTT CCAAGTTGGA GATTCAGTCT 
40 AGACTCGGTA GAAGGGACCT TATCTCGTAC 



AAGGAATCCC CTTAAGCTTC GCCTCCATCG 



AAGTTAATGG TAAACGCCTT GTGGACAGCC 

45 

GG T T AC T T AC TGACTCCGGT ACAGGTATTA 



TGGGGACCTG GTGGCCTGAA TTATATGTCT 
50 ACCAGGCCAC ACCCCCCGAT GTACTCCGTG 



CAAATAATGA AGAATATTGT GGAAATCCTC 



TAACTTCTAA TGATGGGAAT TGGAAATGGC 

55 

CTTTTGTTAA CAATCCTACC AGTTATAATC 



ATCAGAAAGC CAAAGATCTC AT AT C TAG AG 36 6C 

AGGCAGCCCA GGCTGTTAAC CTTCTGCCTA 3 7 20 

GACGACAGTA CACCCTAGAA GACTGGCAAG 3 7 80 

CTCCGGAGGG GACCTGCTAT ACCTCATATG 384 0 

TAGAATATGT CCAACAGATA CATCGTCTAA 3 9 00 

TGGTCAGAAC ATC CCCTTAT CATGTTCTGA 3 96 0 

AACATTGTGT GCCCTGCCAG CTGGTTAATG 4 02 0 

GACTAAGGGG AAGCCACCCA GGCGCTCACT 4 08 0 

CTAAATACGG AAACAAATAT CTATTGGTTT 4 14 0 

CTTATCCTAC TAAAAAAGAG ACTTCAACCG 4 200 

TTCCAAGATT TGGAATACCT AAGGTAATAG 4 250 

AGGTAAGTCA GGGACTGGCC AAGATATTGG 4 3 20 

GACCCCAAAG CTCAGGACAG GTAGAGAGGA 4 380 

AATTGACCAC AGAGACTGGC ATTAATGATT 4 44 0 

GGGTGAGGAA CACCCCTGGA CAGTTTGGGC 4 5 00 

CCCCCCCGTT GGCAGAAATT GCCTTTGCAC 45 6 0 

TGTTCTCTAG GCTCAAGGCG CTCGAGTGGG 462 0 

AGGCCTACTC AGGAGGAGAC TTGCAAGTTC 46 8 0 

ATGTTAGACG CCACCGTGCA GGAAACCTCG 4 74 0 

TTTTGACCAC ACCAACGGCT GTGAAAGTCG 4 800 

CGTGGTTCCT TACTCTGTCA ATAACTCCTC 4 86 0 

CGAACTCCCA TAAACCCTTA TCTCTCACCT 4 92 0 

ATATTAACAG CACTCAAGGG GAGGCTCCCT 4 98 0 

GCCTTCGATC AGTAATCCCT GGTCTCAATG 5 04 0 

CTTACGGGTT TTACGTTTGC CCAGGACCCC 5100 

AGGATTTCTT TTGCAAGCAA TGGAGCTGCA 516 0 

CAGTCTCTCA GCAAGACAGA GTAAGTTACT 5 22 0 

AATTTAATTA TGGCCATGGG AGATGGAAAG 5280 
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ATTGGCAACA GCGGGTACAA AAAGATGTAC 
ACCTAGATTA CTTAAAAATA AGTTTCACTG 
5 GGGTAAATGG TATATCTTGG GGAATAGTGT 
CTGTTCTGAC TATTCGCCTC AGAATAGAAA 
CAAATAAGGG TTTGGCCGAA CAAGGACCTC 

10 

CCTCTGATTA CAATACAACC TCTGGATCAG 
CAGGGGCGAA ACTTTTTAGC CTCATCCAGG 
1 5 CAGAGGCTAC CTCTTCTTGT TGGCTTTGCT 
TGGCTAGAGG AGGGAAATTC AATGTGACAA 
CCCAAAATAA GCTTACCCTT ACTGAGGTTT 

20 

CCCCATCCCA CCAACACCTT TGTAACCACA 
AATATCTGGT AC CTGGTT AT GACAGGTGGT 
25 TTTCCACCTT GGTTTTCAAC CAAACTAAAG 
GGGTGTACTA CTATCCCGAA AAAGCAGTCC 
CAAAAAGAGA GCCCATATCC CTGACACTAG 

30 

GCGTGGGAAC AGGAACGGCT GCCCTAATCA 
GTAACCTACA TCGAATTGTA ACGGAAGATC 
35 TGGAGGAATC CCTAACCTCC TTATCTGAAG 
TGTTATTTCT AAAAG AAGG A GGGTTATGTG 
TAGATCACTC AG GAG C CAT C AGAGACTCCA 

40 

GTCGAAGGGA AAGAGAGGCT GACCAGGGGT 
GGATGACCAC CCTGCTTTCT GCTCTGACGG 
45 CAGTTGGGCC TTGCTTAATT AATAGGTTTG 
TCCAGATCAT GGTACTTAGG CAACAGTACC 
TCTAGCCTTC CCAGTTCTAA G ATT AG AAC T 

50 

GATGAAAATG CAACCTAACC CTCCCAGAAC 
CCCGAATTCC AGACCCTGCT GGCTGC CAGT 
55 TCCAGGGCCT GCTATCCTGG CCTAAGTAAG 
■TCTGGATTCT GTAAAACTGA CTGGCACCAT 
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GAAATAAGCA AATAAGCTGT CATTCGTTAG 5 340 

AAAAAGGAAA ACAAGAAAAT ATTCAAAAGT 54 OC 

ACTATGGAGG CTCTGGGAGA AAGAAAGGAT 54 6 C 

CTCAGATGGA ACCTCCGGTT GCTATAGGAC 55 20 

CAATCCAAGA ACAGAGGCCA TCTCCTAACC 55 8 0 

TCCCCACTGA GCCTAACATC ACTATTAAAA 56 4 0 

GAGCTTTTCA AGCTCTTAAC TCCACGACTC 5 7 00 

TAGCTTCGGG CCCACCTTAC TATGAGGGAA 57 6 0 

AGGAACATAG AG AC C AATGT ACATGGGGAT 58 2 0 

CTGGAAAAGG CACCTGCATA GGGATGGTTC 58 80 

CTGAAGCCTT TAATCGAACC TCTGAGAGTC 5 94 0 

GGGCATGTAA TACTGGATTA ACCCCTTGTG 6 0 00 

ACTTTTGCGT TATGGTCCAA ATTGTCCCCC 60 60 

TTGATGAATA TGACTATAGA TATAATCGGC 6120 

CTGTAATGCT CGGATTGGGA GTGGCTGCAG 6180 

CAGGACCGCA ACAGCTGGAG AAAGGACTTA 6 2 40 

TCCAAGCCCT AGAAAAATCT GTCAGTAACC 6 3 00 

TGGTTCTACA GAACAGAAGG GGGTTAGATC 6 3 60 

TAGCCTTAAA AGAGGAATGC TGCTT CTATG 6 4 20 

TGAGCAAGCT TAGAGAAAGG TT AG AG A G G C 64 8 0 

GGTTTGAAGG ATGGTTCAAC AGGTCTCCTT 6 54 0 

GGCCCCTAGT AGTCCTGCTC CTGTTACTTA 66 OG 

TTGC CTTTGT TAG AG AAC G A GTGAGTGCAG 66 6 0 

AAGGCCTTCT GAGCCAAGGA GAAACTGACC 6 72 0 

ATTAACAAGA CAAGAAGTGG GGAATGAAAG 6 7 80 

CCAGGAAGTT AATAAAAAGC TCTAAATGCC 6 84 0 

AAAT AG G TAG AAGG TC AC AC TTCCTATTGT 6 9 00 

ATAACAGGAA ATGAGTTGAC TAATCGCTTA 6 96 0 

AGAAGAATTG ATT AC AC ATT GACAGCCCTA 70 2 0 
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GTGACCTATC TCAACTGCAA TCTGTCACTC TGCCCAGGAG CCCACGCAGA TGCGGACCTC 7C80 

CCGAG C TAT T T TAAAATG AT TGGTCCACGG AGCGCGGGCT CTCGATATTT TAAAATGATT 7140 

5 

GGTCCATGGA GCGCGGGCTC TCGATATTTT AAAATGATTG GTTTGTGACG CACAGGCTTT 72 00 

GTTGTGAACC CCATAAAAGC TGTC CCGATT CCGCACTCGG GGCCGCAGTC CTCTACCCCT 72 6 0 

10 GCGTGGTGTA CGACTGTGGG CCCCAGCGCG CTTGGAATAA AAATCCTCTT GCTGTTTGCA 732 0 

TCAAAAAAAA AAA 73 3 3 
(2) INFORMATION FOR SEQ ID NO : 3 : 

15 

(i.) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8132 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

25 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

GCGTGGTGTA CGACTGTGGG CCCCAGCGCG C TTGG AAT AA AAATCCTCTT GCTGTTTGCA 6 0 

30 

TCAAGACCGC TTCTCGTGAG TGATTAAGGG GAGTCGCCTT TTCCGAGCCT GGAGGTTCTT 12 0 

TTTGCTGGTC TTACATTTGG GGGCTCGTCC GGGATCTGTC GCGGCCACCC CTAACACCCG 18 0 

35 AGAACCGACT TGG AGG T AAA AAGGATCCTC TTTTTAACGT GTATGCATGT ACCGGCCGGC 24 0 

GTCTCTGTTC TGAGTGTCTG TTTTCAGTGG TGCGCGCTTT CGGTTTGCAG CTGTCCTCTC 300 

AGGCCGTAAG GGCTGGGGGA CTGTG AT C AG CAGACGTGCT AG GAG G AT C A CAGGCTGCTG 36 0 

40 

CCCTGGGGGA CGCCCCGGGA GGTGAGGAGA GCCAGGGACG CCTGGTGGTC TCCTACTGTC 42 0 

GGTCAGAGGA CCGAATTCTG TTG C TG AAG C GAAAGCTTCC CCCTCCGCGA CCGTCCGACT 48 0 

45 CTTTTGCCTG CTTGTGGAAG ACGTGGACGG GTCACGTGTG TCTGGATCTG TTGGTTTCTG 54 0 

TTTTGTGTGT CTTTGTCTTG TGTGTCCTTG TCTACAGTTT TAATATGGGA CAGACGGTGA 600 

CGACCCCTCT TAGTTTGACT CTCGACCATT GG ACTG7AAGT TAAATCCAGG GCTCATAATT 66 0 

50 

TGTCAGTTCA GGTTAAGAAG GGACCTTGGC AGACTTTCTG TGTCTCTGAA TGGCCGACAT 72 0 

TCGATGTTGG ATGG CCATC A GAGGGGACCT TTAATTCTGA GATTATCCTG GCTGTTAAAG 7 80 

55 CAGTTATTTT TCAGACTGGA CCCGGCTCTC ATCCCGATCA GGAGCCCTAT ATCCTTACGT 840 

GGCAAGATTT GGCAGAGGAT CCTCCGCCAT GGGTTAAACC ATGG CTG AAT AAG CC AAG AA 900 
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15 



35 



45 



S60 



ACCCAGGTCC CCGAATTCTG GCTCTTGGAG AGAAAAACAA ACACTCGGCT GAAAAAGTCA 

AGCCCTCTCC TCATATCTAC CCCGAGATTG AGGAGCCACC GGCTTGGCCG GAACCCCAAT 102 0 

3 CTGTTCCCCC ACCCCCTTAT CTGGCACAGG GTGCCGCGAG GGGACCCTTT GCCCCTCCTG 1080 

GAGCTCCGGC GGTGGAGGGA CCTGCTGCAG GGACTCGGAG CCGGAGGGGC GCCACCCCGG 114 0 

10 AGCGGACAGA CGAGATCGCG ACATTACCGC TGCGCACGTA CGGCCCTCCC ACACCGGGGG 12 00 

GCCAATTGCA GCCCCTCCAG TATTGGCCCT TTTCTTCTGC AGATCTCTAT AATTGGAAAA 126 0 

CTAACCATCC CCCTTTCTCG GAGGATCCCC AACGCCTCAC GGGGTTGGTG GAGTCCCTTA 13 2 0 

TGTTCTCTCA CCAGCCTACT TGGGATGATT GTCAACAGCT GCTGCAGACA CTCTT CACAA 13 8 0 

CCGAGGAGCG AGAGAGAATT CTATTAGAGG CTAGAAAAAA TGTTCCTGGG GCCGACGGGC 14 4 0 

20 GACCCACGCG GTTGCAAAAT GAGATTGACA TGGGATTTCC CTTAACTCGC CCCGGTTGGG 15 00 

ACTACAACAC GGCTGAAGGT AGGGAGAGCT TGAAAATCTA TCGCCAGGCT CTGGTGGCGG 15 6 0 

GTCTCCGGGG CGCCTCAAGA CGGCCCACTA ATTTGGCTAA GGTAAGAGAA GTGATGCAGG 16 2 0 

GACCGAATGA ACCCCCCTCT GTTTTTCTTG AGAGGCTCTT GGAAGCCTTC AGGCGGTACA 16 8 0 

CCCCTTTTGA TCCCACCTCA GAGGCCCAAA AAGCCTCAGT GGCTTTGGCC TTTATAGGAC 174 0 

30 AGTCAGCCTT GGATATTAGA AAGAAGCTTC AGAGACTGGA AGGGTTACAG GAGGCTGAGT 18 OC 

TACGTGATCT AGTGAAGGAG GCAGAGAAAG TATATTACAA AAGGGAGACA GAAGAAGAAA I8 6 0 

GGGAACAAAG AAAAGAGAGA GAAAGAGAGG AAAGGGAGGA AAGACGTAAT AAACGGCAAG 192 C 

AGAAGAATTT GACTAAGATC TTGGCTGCAG TGGTTGAAGG GAAAAGCAAT ACGGAAAGAG 198 0 

AGAGAGATTT TAGGAAAATT AGGTCAGGCC CTAGACAGTC AGGGAACCTG GGCAATAGGA 204 0 

40 CCCCACTCGA CAAGGACCAA TGTGCATATT GTAAAGAAAG AGGACACTGG GCAAGGAACT 2100 

GCCCCAAGAA GGGAAACAAA GGACCAAGGA TCCTAGCTCT AGAAGAAGAT AAAGATTAGG 216 0 

GGAGACGGGG TTCGGACCCC CTCCCCGAGC CCAGGGTAAC TTTGAAGGTG GAGGGGCAAC 22 2 0 

CAGTTGAGTT CCTGGTTGAT ACCGGAGCGA AACATTCAGT G CT ACT AC AG CCATTAGGAA 22 8 C 

AACTAAAAGA TAAAAAATCC TGGGTGATGG GTGCCACAGG GCAACAACAG TATCCATGGA 234 0 

50 CTACCCGAAG AACAGTTGAC TTGGGAGTGG GACGGGTAAC CCACTCGTTT CTGGTCATAC 24 00 

CTGAGTGCCC AGCACCCCTC TTAGGTAGAG ACTTATTGAC CAAGATGGGA GCACAAATTT 24 6 0 

CTTTTGAACA AGGGAAACCA GAAGTGTCTG CAAATAACAA ACCTATCACT GTGTTGACCC 2 5 20 

TCCAATTAGA TGACGAATAT CGACTATACT CTCCCCTACT AAAGCCTGAT CAAAATATAC 2 5 80 
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AATTCTGGTT GGAACAGTTT CCCCAAGCCT 
AGCAAGTTCC CCCACAAGTT ATTCAACTGA 
5 AGTACCCCTT GAGTAAAGAA GCTCAAGAAG 
AACAGGGCAT CCTAGTTCCT GTCCAATCTC 
AGCCTGGGAC T AAT G ACT AT CGACCAGTAC 

10 

AGGATATACA CCCAACAGTC CCGAACCCTT 
GGAGCTGGTA TACAGTATTG GACTTAAAGG 
15 CTAGCCAACC ACTTTTTGCC TTCGAATGGA 
TCACCTGGAC CCGACTGCCC CAAGGGTTCA 
TACACAGAGA CCTGGCCAAC TTCAGGATCC 

20 

TGGATGACCT GCTTCTGGCG GGAGCCACCA 
TACTGCTGGA ATTGTCTGAC CTAGGCTACA 
25 G G AG AG AG GT AACATACTTG GGGTACAGTT 
CACGGAAGAA AACTGTAGTC CAGATACCGG 
TTTTGGGGAC AGCTGGATTT TGCAGACTGT 

30 

CACTCTACCC GCTAACCAAA GAAAAAGGGG 
CATTTGATGC TATCAAAAAG GCCCTGCTGA 
35 CTAAACCCTT TACCCTTTAT GTGGATGAGC 
AAACCCTAGG AC C ATGG AG A AGACCTGTCG 
CCAGTGGTTG GCCCATATGC CTGAAGGCTA 

40 

CTGACAAATT GACTTTGGGA CAGAATATAA 
TCGTTCGGCA GCCCCCAGAC CGATGGATGA 
45 TGCTTCTCAC AGAGAGGGTC ACGTTCGCTC 
TGCCTGAAGA GACTGATGAA CCAGTGACTC 
CTGGGGTCCG CAAGGACCTT AC AG AC AT AC 

50 

CTGACGGAAG CAGCTATGTG GTGGAAGGTA 
GGACCCGCAC GATCTGGGCC AGCAGCCTGC 
55 TCATGGCCCT CACGCAAGCT TTGCGGCTGG 
ACAGCAGGTA TGCCTTTGCG ACTGCACACG 
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GGGCAGAAAC CGCAGGGATG GGTTTGGCAA 2 64 0 

AGGCCAGTGC CACACCAGTG TCAGTCAGAC 2 70 0 

GAATTCGGCC GCATGTCCAA AG AT T AAT C C 2 76 0 

CCTGGAATAC TCCCCTGCTA CCGGTTAGAA 2 820 

AGGACTTGAG AGAGGTCAAT AAACGGGTGC 2 8 80 

ATAACCTCTT GTGTGCTCTC CCACCCCAAC 2 94 0 

ATGCCTTCTT CT3CCTGAGA TTACACCCCA 3000 

GAGATCCAGG TACGGGAAGA ACCGGGCAGC 306 0 

AGAACTCCCC GACCATCTTT GACGAAGCCC 312 0 

AACACCCTCA GGTGACCCTC CTC CAGTACG 318 0 

AACAGGACTG CTTAGAAGGC ACGAAGGCAC 3 24 0 

GAGCCTCTGC TAAGAAGGCC CAGATTTGCA 3 3 00 

TGCGGGACGG GCAGCGATGG CTGACGGAGG 3 36 0 

CCCCAACCAC AGCCAAACAA ATGAGAGAGT 3 42 0 

GGATCCCGGG GTTTGCGACC TTAGCAG CCC 34 8 0 

AATTCTCCTG GGCTCCTGAG CACCAGAAGG 3 540 

GCGCACCTGC TCTGGCCCTC CCTGACGTAA 3 6 00 

GTAAGGGAGT AGCCCGGGGA GTTTTAACCC 3 650 

CCTACCTGTC AAAGAAGCTC GATCCTGTAG 3 72 0 

TCGCAGCTGT GGCCATACTG GTCAAGGACG 3 7 80 

CTGTAATAGC CCCCCATGCA TTGGAGAACA 384 0 

CCAACGCCCG CATGACCCAC TATCAAAGCC 3 900 

CACCAGCCGC TCTCAACCCT GCCACTCTTC 3 96 0 

ATGATTGCCA TCAACTATTG ATT GAG GAG A 4C2 0 

CGCTGACTGG AG AAGTG C T A ACCTGGTTCA 4 08 0 

AGAGGATGGC TGGGGCGGCG GTGGTGGACG 4 14 0 

CGGAAGGAAC TTCAGCACAA AAGGCTGAGC 4 200 

CCGAAGGGAA ATCCATAAAC ATTTATACGG 4 26 0 

TACATGGGGC CATCTATAAA CAAAGGGGGT 4 320 
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TGCTTACCTC AG C AG G G AG G GAAATAAAGA ACAAAGAGGA AATTCTAAGC CTATTAGAAG 43 80 

CCGTACATTT ACCAAAAAGG C TAG CT ATT A TACACTGTCC TGGACATCAG AAAGCTAAAG 44 4 3 

5 

ATCTCATATC CAGAGGAAAC CAGATGGCTG ACCGGGTTGC CAAGCAGGCA GCCCAGGGTG 4 5 00 

TTAACCTTCT GCCTATAATA GAAATGCCCA AAGCCCCAGA ACCCAGACGA CAGTACACCC 4 56 0 

10 TAGAAGACTG GCAAGAGATA AAAAAGATAG ACCAGTTCTC TGAGACTCCG GAAGGGACCT 46 2 0 

GCTATACCTC AGATGGGAAG GAAATCCTGC CCCACAAAGA AGGGTTAGAA TATGTCCAAC 4 68 0 

AGATACATCG TCTAACCCAC CTAGGAACTA AACACCTGCA GCAGTTGGTC AGAACATCCC 4 74 0 

15 

CTTATCATGT TCTGAGGCTA CCAGGAGTGG CTGACTCGGT GGTCAAACAT TGTGTGCCCT 4 8 00 

GCCAGCTGGT TAATGCTAAT CCTTCCAGAA TGCCTCCAGG GAAGAGACTA AGGGGAAGCC 4 3 60 

20 ACCCAGGCGC TCACTGGGAA GTGGACTTCA CTGAGGTAAA GCCGGCTAAA TACGGAAACA 4 92 0 

AATAC CT ATT GGTTTTTGTA GACACCTTTT CAGGATGGGT AGAGGCTTAT CCTACTAAGA 4 98 0 

AAGAGACTTC AACCGTGGTG GCTAAAAAAA TACTGGAAGA AATTTTTCCA AGATTTGGAA 5 04 0 

25 

TACCTAAGGT AATAGGGTCA GACAATGGTC CAGCTTTTGT TGCCCAGGTA AGTCAGGGAC 5100 

TGGCCAAGAT ATTGGGGATT GATTGGAAAC TGCATTGTGC AT AC AG AC CC CAAAGCTCAG S160 

30 GACAGGTAGA GAGGATGAAT AGAACCATTA AAGAGACCCT TACTAAATTG ACCGCGGAGA 5 220 

CTGGCGTTAA TGATTGGATA GCTCTCCTGC CCTTTGTGCT TTTTAGGGTT AGGAACACCC 52 30 

CTGGACAGTT TGGGCTGACC CCCTATGAAT TACTCTACGG GGGACCCCCC CCATTGGTAG 5 34 0 

35 

AAATTGCTTC TGTACATAGT GCTGACGTGC TGCTTTCCCA GCCTTTGTTC TCTAGGCTCA 5 4 00 

AGGCACTTGA GTGGGTGAGA CAACGAGCGT GGAGGCAACT CCGGGAGGCC TACTCAGGAG 54 6 C 

40 GAGGAGACTT GCAGATCCCA CATCGTTTCC AAGTGGGAGA TTCAGTCTAC GTTAGACGCC 5 5 2C 

ACCGTGCAGG AAACCTCGAG ACTCGGTGGA AGGGCCCTTA TCTCGTACTT TTGACCACAC 55 80 

CAACGGCTGT GAAAGTCGAA GGAATCTCCA CCTGGATCCA TGCATCCCAC GTTAAACCGG 56 40 

45 

CGCCACCTCC CGATTCGGGG TGGAAAGCCG AAAAGACTGA AAATCCCCTT AAGCTTCGCC 57 0 0 

TCCATCGCGT GGTTCCTTAC TCTGTCAATA ACCTCTCAGA CTAATGGTAT GCGCATAGGA 57 6 0 

50 GACAGCCTGA ACTCCCATAA ACCCTTATCT CTCACCTGGT TAATTACTGA CTCCGGCACA 58 2 0 

GGTATTAATA TCAACAACAC TCAAGGGGAG GCTC CTTTAG GAACCTGGTG GCCTGATCTA 58 80 

TACGTTTGCC TCAGATCAGT TATTCCTAGT CTGACCTCAC C C CC AG AT AT CCTCCATGCT 5 94 0 

CACGGATTTT ATGTTTGCCC AGGACCACCA AATAATGGAA AACATTGCGG AAATCCCAGA 600 0 
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GATTTCTTTT GTAAACAATG GAACTGTGTA 
ACCTCTCAGC AGGATAGGGT AAGTTTTTCT 
5 TTTAATTACC TGACCTGGAT TAGAACTGGA 
TACCTAAAAA TAAGTTTCAC TGAGAAAGGA 
GGTATGTCTT GGGGAATGGT ATATTATGGA 

10 

ACTATTCGCC TCAAAATAAA CCAGCTGGAG 
TTGACGGGTC AAAGACCCCC AACCCAAGGA 
15 TCAGACCCCA CTGAGTCTAA CAGCACGACT 
CAGGGAGCTT TTCAAGCTCT TAACTCCACG 
TGCTTAGCTT CGGGCCCACC TTACTATGAA 

20 

ACAAAAGAAC ATAGAGACCA ATGCACATGG 
GTTTCTGGAA AAGGCACCTG CATAGGAAAG 
25 CACACTGAAG CCTTTAATCA AACCTCTGAG 
TGGTGGGCAT GTAATACTGG ATTAACCCCT 
AAAGATTTTT GCATTATGGT CCAAATTGTT 

30 

ATCCTTGATG AATATGACTA CAGAAATCAT 
CTTGCTGTGA TGCTCGGACT TGGAGTGGCA 
35 GTCACGGGAC CACAGCAGCT AGAAACAGGA 
GATCTCCAAG CCCTAGAAAA ATCTGTCAGT 
GAAGTAGTCC TACAGAATAG AAGAGGGTTA 

40 

TGTGTAGCCT TGAAGGAGGA ATGCTGTTTT 
TCCATGAACA AGCTTAGAGA AAGGTTGGAG 
45 GGGTG GTTTG AGGGATGGTT CAACAGGTCT 
ACAGGACCCT TAATAGTCCT CCTCCTGTTA 
TTAATTGCCT TCATTAGAGA ACGAATAAGT 

50 

TACCAAAGCC CGTCTAGCAG GGAAGCTGGC 
TATTAACAAG AGAAGAAGTG GGGAATGAAA 
55 GCTTAAAATT GTTCTGAATT CCAGAGTTTG 
CTGTTTTAAA ATATGCGGAA GTAAAATAGG 



. 60 . PCT/US96/19680 

ACCTCTAATG ATGGATATTG GAAATGGCCA 6 06 0 

TATGTCAACA CCTATACCAG CTCTGGACAA 6120 

AGCCCCAAGT GCTCTCCTTC AG AC CT AG AT 6180 

AAACAAGAAA ATATCCTAAA ATGGGTAAAT 624 0 

GGCTCGGGTA AACAACCAGG CTCCATTCTA 63 00 

CCTCCAATGG CTATAGGACC AAATACGGTC 5 36 0 

CCAGGACCAT CCTCTAACAT AACTTCTGGA 54 20 

AAAATGGGGG CAAAACTTTT TAGCCTCATC 64 8 0 

ACTCCAGAGG CTACCTCTTC TTGTTGGCTA 6 54 0 

GGAATGGCTA GAAGAGGGAA ATTCAATGTG 66 00 

GGATCCCAAA ATAAGCTTAC CCTTAC TG AG 6 66 0 

GTTCCCCCAT CCCACCAACA CCTTTGTAAC 67 2 0 

AGTCAATATC TGGTACCTGG TTATGACAGG 678 0 

TGTGTTTCCA CCTTGGTTTT TAACCAAACT 6 84 0 

CCCCGAGTGT ATTACTATCC CGAAAAAGCA 6 50 0 

C G AC AAAAG A GAGAACCCAT ATCTCTGACA 696 0 

G C AG GTG TAG GAACAGGAAC AGCTGCCCTG 7 02 0 

CTTAGTAACC TACATCGAAT TGTAACAGAA 7 08 0 

AACCTGGAGG AATCCCTAAC CTCCTTATCT 714 0 

G AT T T ATT AT TTCTAAAAGA AGGAGGATTA 72 0 0 

TATGTGGATC ATTCAGGGGC CATCAGAGAC 726 0 

AAGCGTCGAA GGGAAAAGGA AACTACTCAA 7 32 0 

CTTTGGTTGG CTACCCTACT TTCTGCTTTA 7 380 

CTCACAGTTG GGCCATGTAT TATTAACAAG 74 4 0 

GCAGTCCAGA TCATGGTACT TAGACAACAG 7 50 0 

CGCTAGCTCT ACCAGTTCTA AGATTAGAAC 7 56 0 

GGATGAAAAT ACAACCTAAG CTAATGAGAA 7 6 20 

TTCCTTATAG GTAAAAGATT AGGTTTTTTG 7 6 80 

CCCTGAGTAC ATGTCTCTAG GCATGAAACT 77 4 0 
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TCTTGAAACT ATTTGAGATA AC AAG AAAAG GGAGTTTCTA ACTGCTTGTT TAG CTTCTGT 78 00 

AAAACTGGTT GCGCCATAAA GATGTTGAAA TGTTGATACA CATATCTTGG TGACAACATG 7 86 0 

TCTCCCCCAC CCCGAAACAT GCGCAAATGT GTAACTCTAA AACAATTTAA ATTAATTGGT 7 92 0 

CCACGAAGCG CGGGCTCTCG AAGTTTTAAA TTGACTGGTT TGTGATATTT TGAAATGATT 7 98 0 

GGTTTGTAAA GCGCGGGCTT TGTTGTGAAC C C CAT AAAAG CTGTCCCGAC TCCACACTCG 8 04 0 

GGGCCGCAGT CCTCTACCCC TGCGTGGTGT ACGACTGTGG GCCCCAGCGC GCTTGGAATA 810 0 

AAAATCCTCT TGCTGTTTGC ATCAAAAAAA AA 813 2 
(2) INFORMATION FOR SEQ ID NO : 4 : 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : CDNA 



Cxi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 
TGCCTAGAGA CATGTACTC 19 
(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

<C) STRANDEDNESS: single __ 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 
CCTCTTCTAG CCATTCCTTC A 21 
(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{ D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 
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TCGAGACTCG GTGGAAGGGC CC 

(2) INFORMATION FOR SEQ ID NO : 7 : 

fi) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 
GGGCCCTTCC ACCGAGTCTC GA 
(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
(CJ STRANDEDNESS : single 
( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE : cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

ACCTGGATCC ATGCATCCCA CG 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 22 base pairs 
£B) TYPE: nucleic acid 
(C) STRANDEDNESS : single 
CD) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
CGTGGGATGC ATGGATCCAG GT 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 10 : 



GGCGCCACCT CCCGATTCGG 
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(2 J INFORMATION FOR SEQ ID NO : 1 1 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2 0 base pairs 
(3) TYPE: nucleic acid 
(C) STRAND EDNES S : single 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 
CCGAATCGGG AGGTGGCGCC 
(2} INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 : 

TCCCCTTAAG CTTCGCCTCC 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 2 0 base pairs 
(B) TYPE: nucleic acid 
<C> STRANDEDNESS : single 
( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 13 : 
GGAGGCGAAG CTTAAGGGGA 
(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 14 : 
AAAAGCACAA AGGGCAGGAG AGC 
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(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

{ii) MOLECULE TYPE: cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GCTCTCCTGC CCTTTGTGCT TTT 2 3 

15 (2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 
25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 16 : 

CCTTTAGGAA CCTGGTGGCC 2 0 

(2) INFORMATION FOR SEQ ID NO: 17: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 CD) TOPOLOGY : 1 inear 

( i i ) MOLECULE TYPE : cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

40 

GGCCACCAGG TTCCTAAAGG 2 0 

(2) INFORMATION FOR SEQ ID NO: 18: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

50 

iii; MOLECULE TYPE : cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
55 CCCCCAGATA TCCTCCATGC 2 0 

(2) INFORMATION FOR SEQ ID NO : 1 9 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 2 0 base pairs 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GCATGGAGGA TATCTGGGGG 

(2) INFORMATION FOR SEQ ID NO : 2 0 : 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
0 (D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

5 

GCAGTTTCCA ATCAATCC CC AA 
(2) INFORMATION FOR SEQ ID NO: 21: 

0 (l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{ D) TOPOLOGY: linear 

5 

( i i ) MOLECULE TYPE : cDNA _ 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 21: 
TTGGGGATTG ATTGGAAACT GC 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{ D ) TOPOLOGY : 1 inea r 

(iii MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
TTTATGTTTG CCCAGGACCA CCA 



(2) INFORMATION FOR SEQ ID NO : 2 3 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

TGGTGGTCCT GGGCAAACAT AAA 

(2 ) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 
{A} LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
GGGAGGTGGC GCCGGCTTAA CGT 
(2) INFORMATION FOR SEQ ID NO : 2 5 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 23 base pairs 

( B ) v TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

{ i i > MOLECULE TYPE : CDNA 

fxi} SEQUENCE DESCRIPTION: SEQ ID NO : 2 5 : 
ACGTTAAGCC GGCGCCACCT CCC 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CCCCCAACCC AAGGACCAGG ACCA 
(2) INFORMATION FOR SEQ ID NO: 27: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 7 : 
TGGTCCTGGT CCTTGGGTTG GGGG 
[2) INFORMATION FOR SEQ ID NO : 2 B : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 3 : 
GCAGCACGAC TAAAATGGGG GC 
(2) INFORMATION FOR SEQ ID NO: 29: 

U) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 29 : 
GCCCCCATTT TAGTCGTGCT GC 
(2) INFORMATION FOR SEQ ID NO : 3 0 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 0 : 

CCCCCATCCC ACCAACACCT 

(2) INFORMATION FOR SEQ ID NO : 3 1 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : 1 1 near 

(ii) MOLECULE TYPE: cDNA 
(xi> SEQUENCE DESCRIPTION: SEQ ID NO 
AGGTGTTGGT GGGATGGGGG 
(2) INFORMATION FOR SEQ ID NO: 32: 

Ci) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: cDNA 

<xi> SEQUENCE DESCRIPTION: SEQ ID NO: 
TCTCCCCCAC CCCGAAACAT 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(Xii SEQUENCE DESCRIPTION: SEQ ID NO: 

ATGTTTCGGG GTGGGGGAGA 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2 4 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: cDNA 

(xi J SEQUENCE DESCRIPTION: SEQ ID NO: 

AG C C AAG AAA GCCAGGTCCC CGAA 

(2) INFORMATION FOR SEQ ID NO : 3 5 : 

tl) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
<B) TYPE; nucleic acid 



WO 97/21836 



-69- 



PCT/US96/19G80 



(C) STRANDEDNESS: single 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 5 : 
TTCGGGGACC TGGCTTTCTT GGCT 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 



•xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

AGGCTCTGGT GGCGGGTCTC C 

(2) INFORMATION FOR SEQ ID NO : 3 7 ; 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 21 base pairs 
{3) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 7 : 
GGAGACCCGC CACCAGAGCC T 
(2) INFORMATION FOR SEQ ID NO : 3 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CCGCAGGGAT GGGTTTGGCA 
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40 



(2) INFORMATION FOR SEQ ID NO : 3 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 9 : 

15 TGCCAAACCC ATCCCTGCGG 20 

{2} INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS : 
20 (A) LENGTH; 22 base pairs 

(3} TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 (ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 0 : 
GCTCACCTGG ACCCGACTGC CC 22 
(2) INFORMATION FOR SEQ ID NO: 41: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs ~" 

(B) TYPE: nucleic acid- 

(C) . STRANDEDNESS: single 

(D) TOPOLOGY : linear 



(ii) MOLECULE TYPE: cDNA 



45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 1 : 

GGGCAGTCGG GTCCAGGTGA GC 



50 



22 

(2) INFORMATION FOR SEQ ID NO : 4 2 : 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
(3) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
5} (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: 



cDNA 
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(xi> SEQUENCE DESCRIPTION : SEQ ID NO: 42: 
GTTTACGGGA CGGGCAGCGA TGGC 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 43: 
GCCATCGCTG CCCGTCCCGT AAAC 
(2) INFORMATION FOR SEQ ID NO : 4 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
TGGCTGGGGC GGCGGTGGTG GACGGG 
(2) INFORMATION FOR SEQ ID NO:45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
CCCGTCCACC ACCGCCGCCC CAGCCA 
(2) INFORMATION FOR SEQ ID NO:46: 
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(i) SEQUENCE CHARACTERISTICS: 
CAJ LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 46: 
GCCCAAAGCC CCAGAACCCA GACG 
(2) INFORMATION FOR SEQ ID NO : 4 7 : 

(i) S EQUENCE CHARACTERI S T I CS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 7 : 
CGTCTGGGTT CTGGGGCTTT GGGC 
(2) INFORMATION FOR SEQ ID NO : 4 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
GATGAACAGG CAGACATCTG 
(2) INFORMATION FOR SEQ ID NO : 4 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 
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(xij SEQUENCE DESCRIPTION; SEQ ID NO : 4 9 : 
CGCTTACAGA CAAGCTGTGA 2 0 

5 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 



15 



35 



45 



(ii) MOLECULE TYPE : cDNA 



(xij SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

20 AGAACAAAGG CTGGGAAGC 19 

{2) INFORMATION FOR SEQ ID NO: 51: 

(i; SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 2 0 base pairs 

(3) TYPE: nucleic acid 
(C) STRANDEDNESS : single 
CD) TOPOLOGY: linear 

30 (iii MOLECULE TYPE: cDNA 



(Xl; SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
ATAGGAGACA GCCTGAACTC — . 2 0 

(2) INFORMATION FOR SEQ ID NO : 52 : 

40 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
CO STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

GGACCATTGT CTGACCCTAT 2 0 

(2) INFORMATION FOR SEQ ID NO: 53: 

55 

<i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
GTCAACACCT ATACCAGCTC 
(2) INFORMATION FOR SEQ ID NO : 54 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 54: 
CATCTGAGGT ATAGCAGGTC 
(2) INFORMATION FOR SEQ ID NO: 55: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GCAGGTGTAG GAACAGGAAC 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56 
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ACCTGTTGAA CCATCCCTCA 
(2) INFORMATION FOR SEQ ID NO : 5 7 ; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



Ui) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
C G AATGG AG A GATCCAGGTA 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 8 : 
CCTGCATCAC TTCTCTTACC 
(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 9 : 
TTGCCTGCTT GTGGAATACG 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(DJ TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 



{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

CAAGAGAAGA AGTGGGGAAT G 

(2) INFORMATION FOR SEQ ID NO : 6 1 : 

(i) SEQUENCE CHARACTERISTICS: 
(A J LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 61: 
CACAGTCGTA CACCACGCAG 
(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: 
GGGAGACAGA AGAAGAAAGG 
(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 63: 
CGATAGTCAT TAGTCCCAGG 
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(2) INFORMATION FOR SEQ ID NO: 64: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{ i i ) MOLECULE TYPE : cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
TGCTGGTTTG CATCAAGACC G 
(2) INFORMATION FOR SEQ ID NO: 65: 

{i> SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 5 : 
GTCGCAAAGG CATACCTGCT 
(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

ACAGAGCCTC TGCTAAGAAG 

(2) INFORMATION FOR SEQ ID NO : 6 7 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
{ B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GCAGCTGTTG ACAATCATC 
(2) INFORMATION FOR SEQ ID NO : 6 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
TATGAGGAGA GGGCTTGACT 

(2) INFORMATION FOR SEQ ID NO : 6 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 6 9 : 
AGCAGACGTG CTAGGAGGT 
(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:70: 
TCCTCTTGCT GTTTGCATC 

(2) INFORMATION FOR SEQ ID NO : 7 1 : 



WO 97/21836 



-79- 



PCT/US96/19680 



10 



15 



25 



30 



40 



45 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

( ii ) MOLECULE TYPE : cDNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 7 1 : 
CAGACACTCA GAACAGAGAC 2 0 

(2) INFORMATION FOR SEQ ID NO: 72: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
20 ( B ) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 2 : 
ACATCGTCTA ACCCACCTAG 2C 
{2) INFORMATION FOR SEQ ID NO:73: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pair£ 
35 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
CTCGTTTCTG GTCATACCTG A 
(2) INFORMATION FOR SEQ ID NO:74: 



(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 19 base pairs 
50 (B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE : cDNA 



55 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
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GAGTACATCT CTCTAGGCA 
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What is claimed is: 

1 . A purified nucleic acid which can specifically hybridize with the sequence of 
SEQ ID NO: 1 or its complement, provided that said nucleic acid is other than the entire 

5 retroviral genome of SEQ ID NO: 1 or its complement. 

2. The purified nucleic acid of claim K wherein said nucleic acid is at least one 
nucleotide longer, or at least 1 nucleotide shorter, or differs in sequence at at least one 
position from SEQ ID NO: 1 or its complement. 

10 

3. The purified nucleic acid of claim 1 wherein said nucleic acid has at least 72% 
sequence identity or homology with a sequence from SEQ ID NO: 1 or its complement. 

4. The purified nucleic acid of claim 1, wherein said nucleic acid is at least 15 
1 5 nucleotides in length. 

5. The purified nucleic acid of claim 1, wherein said nucleic acid can specifically 
hybridize with a translatable region of the retroviral genome of SEQ ID NO: 1, or its 
complement. 

20 

6. The purified nucleic acid of claim 1 , wherein said nucleic acid can specifically 
hybridize with a region from the gag, pol, or env gene. 

7. The purified nucleic acid of claim f wherein said nucleic acid can specifically 
25 hybridize with an untranslated region of the retroviral genome of SEQ ID NO: 1 , or its 

complement. 

8. The purified nucleic acid of claim 1 , wherein said nucleic acid can specifically 
hybridize with a non-conserved region of the retroviral genome of SEQ ID NO: 1, or its 

30 complement. 

9. The purified nucleic acid of claim K wherein said nucleic acid can specifically 
hybridize with highly conserved regions of the retroviral genome of SEQ ID NO: 1 . or its 
complement. 



35 



10. The purified nucleic acid of claim 1 , wherein the nucleic acid is selected from 
the group consisting of SEQ ID NOs: 4-74. 
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1 1. A purified nucleic acid which hybridizes under stringent conditions to a nucleic 
acid chosen from; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
sequence from nucleotides 2-1999 of SEQ ID NO: I, or naturally occurring mutants thereof 
a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
nucleotides 4871-8060 of SEQ ID NO: I, or naturally occurring mutants thereof, and a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
nucleotides 2452-4839 of SEQ ID NO:I, or naturally occurring mutants thereof. 

12. The purified nucleic acid of claim 1 1. wherein said nucleic acid is a nucleic 
acid of at least 10 consecutive nucleotides of sense or antisense sequence from nucleotides 
2-1999 of SEQ ID NO:l, or naturally occurring mutants thereof 

13. The purified nucleic acid of claim 1 1, wherein said nucleic acid is a nucleic 
acid of at least 10 consecutive nucleotides of sense or antisense sequence from nucleotides 
4871-8060 of SEQ ID NO:l, or naturally occurring mutants thereof 

14. The purified nucleic acid of claim 1 1, wherein said nucleic acid is a nucleic 
acid of at least 10 consecutive nucleotides of sense or antisense sequence from nucleotides 
2452-4839 of SEQ ID NO:l, or naturally occurring mutants thereof. 

15. A reaction mixture which includes a target nucleic acid and a second nucleic 
acid, wherein the second nucleic acid is chosen from: a sequence which can specifically 
hybridize to a porcine retroviral sequence; a sequence which can specifically hybridize to 
the sequence of SEQ ID NO:l or its complement; a sequence^vhich can specifically 
hybridize to the sequence of SEQ ID NO:2 or its complement; a sequence which can 
specifically hybridize to the sequence of SEQ ID NO:3 or its complement; a nucleic acid 
of at least 10 consecutive nucleotides of sense or antisense sequence which encodes a gag 
protein; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence 
from nucleotides 2452-4839 (e.g, from nucleotides 3 1 1 2-4683) of SEQ ID NO: 1 , 
nucleotides 598-2169 (e.g, from nucleotides 598-2169) of SEQ ID NO:2. or nucleotides 
585-2156 (e.g, from nucleotides 585-21 56) of SEQ ID NO:3, or naturally occurring 
mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l, nucleotides 2320-4737 
of SEQ ID NO:2 ? or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring 
mutants thereof; 
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a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 2-1999 (e.g. from nucleotides 86-1999) of SEQ ID 
NO: I, nucleotides 4738-6722 (eg, from nucleotides 4738-6722) of SEQ ID NO:2, or 
5 nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; a swine or 
miniature swine retroviral nucleic acid; or a Tsukuba nucleic acid. 

16. A purified nucleic acid which can specifically hybridize with the sequence of 
SEQ ID NO: 2 or its complement. 

10 

17. The purified nucleic acid of claim 16, wherein said nucleic acid is at least one 
nucleotide longer, or at least 1 nucleotide shorter, or differs in sequence at at least one 
position from SEQ ID NO: 2 or its complement. 

15 18. The purified nucleic acid of claim 16, wherein said nucleic acid has at least 

72% sequence identity or homology with a sequence from SEQ ID NO: 2 or its 
complement. 

19. The purified nucleic acid of claim 16, wherein said nucleic acid is at least 15 
20 nucleotides in length. 

20. The purified nucleic acid of claim 16, wherein said nucleic acid can specifically 
hybridize with a region from the gag, poL or env gene. 

25 21. The purified nucleic acid of claim 1 6, wherein said nucleic acid hybridizes 

under stringent conditions to a nucleic acid chosen from: a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence from nucleotides 598-2169 of SFQ 
ID NO:2, or naturally occurring mutants thereof, a nucleic acid of at least 10 consecutive 
nucleotides of sense or antisense sequence from nucleotides 2320-4737 of SEQ ID NO:2. 

30 or naturally occurring mutants thereof, and a nucleic acid of at least 10 consecutive 

nucleotides of sense or antisense sequence from nucleotides 4738-6722 of SEQ ID NO:2. 
or naturally occurring mutants thereof. 

22, A purified nucleic acid which can specifically hybridize with the sequence of 
35 SEQ ID NO: 3 or its complement. 
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23. The purified nucleic acid of claim 22, wherein said nucleic acid is at least one 
nucleotide longer, or at least 1 nucleotide shorter, or differs in sequence at at least one 
position from SEQ ID NO: 3 or its complement. 

5 24. The purified nucleic acid of claim 22, wherein said nucleic acid has at least 

72% sequence identity or homology with a sequence from SEQ ID NO: 3 or its 
complement. 

25. The purified nucleic acid of claim 22, wherein said nucleic acid is at least 15 
1 0 nucleotides in length. 

26. The purified nucleic acid of claim 22, wherein said nucleic acid can specifically 
hybridize with a region from the gag, pol, or env gene. 

15 27. The purified nucleic acid of claim 22, wherein said nucleic acid hybridizes 

under stringent conditions to a nucleic acid chosen from: a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence from nucleotides 5620-7533 of SEQ 
ID NO:3, or naturally occurring mutants thereof, a nucleic acid of at least 10 consecutive 
nucleotides of sense or antisense sequence from nucleotides 585-2 156 of SEQ ID NO:3, or 

20 naturally occurring mutants thereof, and a nucleic acid of at least 3 consecutive nucleotides 
of sense or antisense sequence from nucleotides2307-5741 of SEQ ID NO:3, or naturally 
occurring mutants thereof. 

28. A method for screening a cell or a tissue for the pfesence or expression of a 

25 swine or miniature swine retrovirus comprising: 

contacting a target nucleic acid from the tissue with a second nucleic acid selected 
from the group of: a sequence which can specifically hybridize to a porcine retroviral 
sequence; a sequence which can specifically hybridize to the sequence of SEQ ID NO: 1 or 
its complement; a sequence which can specifically hybridize to the sequence of SEQ ID 

30 NO:2 or its complement; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:3 or its complement; a nucleic acid of at least 10 consecutive nucleotides of 
sense or antisense sequence w r hich encodes a gag protein; a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence from nucleotides 2452-4839 (e.g, 
from nucleotides 31 12-4683) of SEQ ID NO:l, nucleotides 598-2169 (e.g, from 

35 nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, from nucleotides 
585-2 1 56) of SEQ ID NO:3, or naturally occurring mutants thereof; a nucleic acid of at 
least 10 consecutive nucleotides of sense or antisense sequence which encodes a pol 
protein; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence 
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from nucleotides 4871-8060 of SEQ ID NO:K nucleotides 2320-4737 of SEQ ID NO:2, or 
nucleotides 2307-5741 of SEQ ID NO:3 ? or naturally occurring mutants thereof; a nucleic 
acid of at least 10 consecutive nucleotides of sense or antisense sequence which encodes a 
env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or antisense 
5 sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID NO:l, 

nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or nucleotides 
5620-7533 of SEQ ID NO:3 ? or naturally occurring mutants thereof, under conditions in 
which hybridization can occur, hybridization being indicative of the presence or expression 
of an endogenous svvinw or miniature swine retrovirus or retroviral sequence in the tissue. 

10 

29. A method for screening a swine or miniature swine genome for the presence of 
a porcine retrovirus, comprising: 

contacting the miniature swine genomic DNA with a second nucleic acid selected 
from the group of: a sequence which can specifically hybridize to a porcine retroviral 

1 5 sequence; a sequence which can specifically hybridize to the sequence of SEQ ID NO: 1 or 
its complement; a sequence which can specifically hybridize to the sequence of SEQ ID 
NO:2 or its complement; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:3 or its complement; a nucleic acid of at least 10 consecutive nucleotides of 
sense or antisense sequence which encodes a gag protein; a nucleic acid of at least 10 

20 consecutive nucleotides of sense or antisense sequence from nucleotides 2452-4839 (e.g, 
from nucleotides 31 12-4683) of SEQ ID NO:L nucleotides 598-2169 (e.g, from 
nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, from nucleotides 
585-2 1 56) of SEQ ID NO:3, or naturally occurring mutants thereof; 
a nucleic acid of at least 10 consecutive nucleotides of sense, or antisense sequence which 

25 encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 

antisense sequence from nucleotides 4871-8060 of SEQ ID NO:K nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3 ? or naturally occurring 
mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
30 encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID 
NO: 1 , nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or 
nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof, under 
conditions in which the sequences can hybridize, 
35 hybridization being indicative of the presence of the endogenous porcine retroviral 
sequence in the miniature swine genome. 
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30. A method of assessing the potential risk assoctated with the transplantation of a 
graft from a donor swine or miniature swine into a recipient animal, comprising: 

contacting a target nucleic acid from the donor, recipient or the graft, with a second 
nucleic acid selected from the group of: a sequence which can specifically hybridize to a 

5 porcine retroviral sequence; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:l or its complement; a sequence which can specifically hybridize to the 
sequence of SEQ ID NO:2 or its complement; a sequence which can specifically hybridize 
to the sequence of SEQ ID NO:3 or its complement; a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence which encodes a gag protein; a 

10 nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
nucleotides 2452-4839 (e.g, from nucleotides 3 1 12-4683) of SEQ ID NO:l, nucleotides 
598-2169 (e.g, from nucleotides 598-2169) of SEQ ID NO:2. or nucleotides 585-2156 (e.g. 
from nucleotides 585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; 
a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 

15 encodes a pol protein; a nucleic acid of at least 1 0 consecutive nucleotides of sense or 

antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l, nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-574 1 of SEQ ID NO. 3. or naturally occurring 
mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
20 encodes a env protein; a nucleic acid of at least 1 0 consecutive nucleotides of sense or 
antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID 
NO:l. nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2. or 
nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof, under 
conditions in which the sequences can hybridize, hybridization being indicative of a risk 
25 associated with the transplantation. 

31. a method of providing a swine or miniature swine free of an activatablc 
retrovirus insertion at a preselected site, comprising: 

performing a cross between a first miniature swine having a retroviral insertion at 

30 the preselected site and a second miniature swine not having a retroviral insertion at a 
preselected site, and recovering a progeny miniature swine, not having the insertion, 
wherein the presence or absence or the retroviral insertion is determined by contacting the 
genome of a miniature swine with a nucleic acid chosen from the group of : a sequence 
which can specifically hybridize to a porcine retroviral sequence; a sequence which can 

3 5 specifically hybridize to the sequence of SEQ ID NO: 1 or its complement; a sequence 
which can specifically hybridize to the sequence of SEQ ID NO:2 or its complement; a 
sequence which can specifically hybridize to the sequence of SEQ ID NO:3 or its 
complement; a nucleic acid of at ieasi 1 0 consecutive nucleotides of sense or antisense 
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sequence which encodes a gag protein; a nucleic acid of at least 10 consecutive nucleotides 
of sense or antisense sequence from nucleotides 2452-4839 (e.g, from nucleotides 3112- 
4683) of SEQ ID NO: 1 , nucleotides 598-2 1 69 (e.g, from nucleotides 598-2 1 69) of SEQ ID 
NO:2, or nucleotides 585-2156 (e.g, from nucleotides 585-2156) of SEQ ID NO:3, or 
5 naturally occurring mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l, nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-574 1 of SEQ ID NO:3, or naturally occurring 
1 0 mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID 
NO: I, nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or 
1 5 nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof. 

32. A method of localizing the origin of a porcine retroviral infection, comprising: 
contacting a target nucleic acid from the graft or organ with a second nucleic acid 
selected from the group of: a sequence which can specifically hybridize to a porcine 

20 retroviral sequence; a sequence which can specifically hybridize to the sequence of SEQ ID 
NO: 1 or its complement; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:2 or its complement; a sequence which can specifically hybridize to the 
sequence of SEQ ID NO:3 or its complement; a nucleic acid of at least \0 consecutive 
nucleotides of sense or antisense sequence which encodes a -gag protein; a nucleic acid of 

25 at least 10 consecutive nucleotides of sense or antisense sequence from nucleotides 2452- 
4839 (e.g, from nucleotides 3112-4683) of SEQ ID NO:l, nucleotides 598-2169 (e.g, from 
nucleotides 598-2169) of SEQ ID NO:2 ? or nucleotides 585-2156 (e.g, from nucleotides 
585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; 
a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 

30 encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 

antisense sequence from nucleotides 4871-8060 of SEQ ID NO:L nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3 ; or naturally occurring 
mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
35 encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID 
NO: I, nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or 
nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; 
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contacting a target nucleic acid from the recipient with a second nucleic acid selected from 
the group of: a sequence which can specifically hybridize to a porcine retroviral sequence; a 
sequence which can specifically hybridize to the sequence of SEQ ID NO:l or its 
complement; a sequence which can specifically hybridize to the sequence of SEQ ID NO:2 
5 or its complement; a sequence which can specifically hybridize to the sequence of SEQ ID 
NO:3 or its complement; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence which encodes a gag protein; a nucleic acid of at least 10 consecutive 
nucleotides of sense or antisense sequence from nucleotides 2452-4839 (e.g. from 
nucleotides 31 12-4683) of SEQ ID NO:L nucleotides 598-2169 (e.g, from nucleotides 

10 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g, from nucleotides 585-2156) of 
SEQ ID NO:3, or naturally occurring mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l, nucleotides 2320-4737 

15 of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring 
mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a env protein; a nucleic acid of at least 1 0 consecutive nucleotides of sense or 
antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999) of SEQ ID 
20 NO:K nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or 
nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof; 
hybridization to the nucleic acid from the graft correlates with the porcine retroviral 
infection in the graft; and hybridization to the nucleic acid from the recipient correlates 
with the porcine retroviral infection in the recipient. — 

25 

33. A method of screening a human subject for the presence or expression of an 
endogenous porcine retrovirus comprising: 

contacting a target nucleic acid derived from the human subject with a second 
nucleic acid selected from the group of: a sequence which can specifically hybridize to a 
30 porcine retroviral sequence; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:l or its complement; a sequence which can specifically hybridize to the 
sequence of SEQ ID NO:2 or its complement; a sequence which can specifically hybridize 
to the sequence of SEQ ID NO:3 or its complement; a nucleic acid of at least 10 
consecutive nucleotides of sense or antisense sequence which encodes a gag protein; a 
35 nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
nucleotides 2452-4839 (e.g, from nucleotides 3 1 1 2-4683) of SEQ ID NO: 1 , nucleotides 
598-2169 (e.g. from nucleotides 598-2169) of SEQ ID NO:2, or nucleotides 585-2156 (e.g. 
from nucleotides 585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof; 
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a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO: 1, nucleotides 2320-4737 
of SEQ ID NO:2, or nucleotides 2307-574 1 of SEQ ID NO:3, or naturally occurring 
5 mutants thereof; 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 2-1999 (e.g, from nucleotides 86-1999 ) of SEQ ID 
NO:l, nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2, or 
10 nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof, under 
conditions in which the sequences can hybridize, hybridization being indicative of the 
presence of the endogenous porcine retrovirus or retroviral sequences in the human subject. 

34. A transgenic miniature swine having a transgenic element at an endogenous 

1 5 porcine retroviral insertion site which corresponds to the retroviral genome of SEQ ID NO: 
1,2, or 3, and wherein said element alters the activity of the endogenous porcine retrovirus. 

35. A method of detecting a recombinant virus or other pathogen, comprising: 
providing a pathogen having porcine retroviral sequence; and 

20 determining if the pathogen includes non-porcine retroviral sequence, the presence 

of non-porcine retroviral sequence being indicative of viral recombination. 

36. A method of determining the copy number, size, or completeness of a porcine 
retrovirus, comprising: 

25 contacting a target nucleic acid from the donor, recipient or a graft, with a second 

nucleic acid selected from the group of: a sequence which can specifically hybridize to a 
porcine retroviral sequence; a sequence which can specifically hybridize to the sequence of 
SEQ ID NO:l or its complement; a sequence which can specifically hybridize to the 
sequence of SEQ ID NO:2 or its complement; a sequence which can specifically hybridize 

30 to the sequence of SEQ ID NO:3 or its complement; a nucleic acid of at least 10 

consecutive nucleotides of sense or antisense sequence which encodes a gag protein; a 
nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence from 
nucleotides 2452-4839 (e.g, from nucleotides 3 1 1 2-4683) of SEQ ID NO : 1 , nucleotides 
598-2169 (e.g, from nucleotides 598-2169) of SEQ in NO"2, or nucleotides 585-2156 (e.g, 

35 from nucleotides 585-2156) of SEQ ID NO:3, or naturally occurring mutants thereof: 

a nucleic acid of at least 10 consecutive nucleotides of sense or antisense sequence which 
encodes a pol protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
antisense sequence from nucleotides 4871-8060 of SEQ ID NO:l, nucleotides 2320-4737 
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of SEQ ID NO:2, or nucleotides 2307-5741 of SEQ ID NO:3, or naturally occurring 
mutants thereof; 

a nucleic acid of at least 1 0 consecutive nucleotides of sense or antisense sequence which 
encodes a env protein; a nucleic acid of at least 10 consecutive nucleotides of sense or 
5 antisense sequence from nucleotides 2-1999 fe.g, from nucleotides 86-1999) of SEQ ID 
NO: 1 ( nucleotides 4738-6722 (e.g, from nucleotides 4738-6722) of SEQ ID NO:2 ? or 
nucleotides 5620-7533 of SEQ ID NO:3, or naturally occurring mutants thereof. 

37. A method for screening a tissue for the presence or expression of a swine or a 
1 0 miniature swine retroviral sequence comprising: 

contacting a tissue sample with an antibody specific for a retroviral protein, thereby 
determining if the sequence is present or expressed. 



15 



38. A purified nucleic acid which can specifically hybridize to a nucleic acid 
sequence comprising nucleotides 2- 1 999 of SEQ ID NO: 1 , nucleotides 487 1 -8060 of SEQ 
ID NO: 1 , or nucleotides 2452-4839 of SEQ ID NO: 1 . 
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OT-XSftGACIC GGTGGAAGGG aOCTTATCIC G^ACTTTTGA CX^CCAAC 

GGCTC7IGAAA GTCGAAGGAA TCTCCACCIG GATCCATGCA TCCCACGITA 

AGCCGG03CC ACCTCCCGAT TCGGGCTQGA AM3CXEAAAA GACIGAAAAT 

CCCCTTAW3C TTCGCCITCA TOGCGrQGnT CCTTACICIG TCAATAACCT 

CTCAGACTAA TGGTATGCGC ATAGGAGACA OXTCAACIC CCATAAACCC 

TTATCICICA CXTCGITAAT TACTCACTCC GC£ACAGGTA TTAATATCAA 

CAACACTCAA GQGGAGGCTC CTTTAGGAAC CTGGIGGCCT GATCTATAQG 

TTIOXTCAG ATCAGTTATT CCTAGTCIGA CCTCACCOX XSATATCCTC 

CATCCICACG GATTTTATCT TTGCCCAGGA CCACCAAATA ATGGAAA^CA 

TTGCGGAAAT CCCAGAGATT TCTTTIGTAA ACWTGGAAC TGTCTAACCT 

CTAATCATCG ATATT3GAAA TOGOCAACCP CTCAGCAQGA TAGGGTTAPOT 

TOXTTATC TCAACACCTA TACC^GCTCT GGAGAATTTA ATTACCTCAC 

CTGGATTAGA ACIGGAAGCC CCAAGTCCTC TCCTTCAGAC CIAGATrACC 

TAAAAATAAG TTTCACTGAG AAAGGAAAAC AteAAAATAT CCT7\AAATOG 

GTAAATGGTA TGnPCTTGGGG AAT3GTATAT TATGGAGGCT CGGGTAAACA 

ACCAGGCIXX ATTCTAACTA TTCGCCTCAA AATAAACCAG CTX^AGCOTC 

CAATGGCTAT AGGACCAAAT AQ3GTCTTGA CGGCTCAAAG PCCCCC&CC 

CAAGG7CCAG GACCATCCTC TAACATAACT TCTCGATCAG ACCCCACIGA 

GTCTAGCAGC ACGACTAAAA TGGGGGCAAA ACTTTTTAGC CTCATCCAGG 

GAGCTTTTCA AGC7ICTTAAC TCCACGACIC CAGAGGCTAC C^CITC^ICr 

TGGCTATGCT TAGCTTTGGG COCACCTT^C TATGAMGAA TGGCTAGAAG 

AGQ3AAATTC AATCTGACAA AA3AACATAG A3ACCAATGC ACATGGGGAT 

CCCAAAATAA GCTTACCCTT ACTGAGGTTT CTGGAAAMG CACCTGCATA 

GGAAAGGTTC CCCCATCCCA CX^ACACCTT TGTAAOCACA CIGAAGCCIT 

TAA1CAAACC TCTGAAAGTC AATATCTOGT ACCIGGTTAT GACAGGTOGT 

QGGCATGTAA TACTGGATTA ACCCCTTGTG TTTCCACCTT GGTTTTTAAC 

F1GHRF I 
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CAAACTAAAG ATTTTTGCAT TATGGTCCAA ATTGITCOCC GAGTCTATTA 
CTATCCOGA*\ AAAGCAATCC TTGATGAATA TGACTAC£GA AATCATOGAC 
AA^AGAGAGA AOCCATATCT CIGACACTTC CTCTGATCCT OGGACTIQGA 
GTGGCAGCAG GTXTTAGGAAC AQGAACAGCT GCCCT3GTCA CGGGACCACA 
GCAGCTAGAA ACXX^CTTA GTAACCIAJGA TCGAATTCTA ACAGAA3ATC 
TGCAAGCCCT AGAAAAATCT GTCAGTAACC TGGAQGAATC OCTAACCICC 
TTATCTGAAG TACTCCTACA GAATAGAAGA GGGTTAGATT TATTATTTGT 
AAA c sGAAGGA GGATTATCTG TAGCCTTGAA GGAGGAATGC TCTTTTTTATG 
TGGATCATTC AGOGGCCATC AGAGACTCCA TGAACAAACT TAGAGAAAGG 
TTCTiAGAAGC GTOGAAGGGA AAAQGAAACT ACTCAAGGGT GGTTTGAGGG 
ATOGTlCAfiC AGGTCTCCTT GGTK3GCTAC CCTACTTTCT GCTTTAACAG 
GACCCTTAAT AGICCTQCTC CTGTTACTCA CAGTTGGGCC ATGTATTATT 
AACAAGTTAA TTCCCTTCAT TAGAGAACGA ATAAGTGCAG TCCAGATCAT 
GGTACTIAiGA C^ACAGTTACC AAAGCCOGTIC TAGCAGGGAA GCTGGCCGCT 
AGCTCTACCA GTTCTAAGAT TAGAACTATT AACA^GAGAA GAAGIGGGGA 
ATGAAAGGAT GAAAATACAA CCTA^GCTAA TGAGftftGCTT AAAATPGTTC 
TGAATTCCAG AG Ti ' luri CC TTATAGGTAA A^GATTAGGT TTTTTGCPGT 
TTTAAAATAT GOGGAAGTAA AATAGGCCCT GAGT^GATGT CTCTAGGCAT 
GAAACTICTT GAAACTATTT GAGATAACAA GAAAAGGGAG TTTCTAACTG 
CITCTTTAGC TTCTCTAAAA CTCGITCCGC CATAAAGATG TrGAAATGTTT 
GATACACATA TCTTOGTCAC AACATGTCIC CCCCACCCCG AAACATGOGC 
AAATCTGTAA CTCTAAAACA ATTTAAATTA ATTCOTCCAC GAAGCGOGGG 
CnXTTCGA^OT TTTAAATTGA CTOGTTTGTDG ATATTTTGAA ATGATTOJTT 
TGTAAAGCGC GGGCTTTCCT GTCAACCCCA TAAAAGCTGT OCOGACTCCA 
CACTCGGGGC CGCAGTOCTC TAOCCCTQQG TCGTGTACGA CTGVGGCTCC 



1350 (SEQ ID NO: 
conn 1 d 

1400 

1450 

1500 

1550 

1600 

1650 

1700 

1750 

1800 

1850 

1900 

1950 

2000 

2050 

2100 

2150 

2200 

2250 

2300 

2350 

2400 

2450 

2500 

2550 



FIGURE 1. CONT. 



SUBSTITUTE SHEET (RULE 26) 



WO 97/21836 



PCT/US96/19680 



3/34 



CACOGOGCTT GGAATAAAAA TCCTCTTGCT GTTT3CATCA K^ACCGCTTC 
TCGTGAGTGA TTAAGGGGT^G TOjCCTrTIC OGAGCCIGGA GGTTCTITTT 
GCTGGTICTTft CATTTOGQQG CTCGTCCGGG ATCTGTCGCG OCCACCCCTA 
/^CCCGAGA ACCGACTIGG AGGTAAAAAG GAfTOCTCTTT TTAACGK7TA 
"GCATCTACC GGCCGGCGTC TCTGTTTCPGA GTGTCTGTTT TCAGTGGTCC 

GOC<rrrroGG tttgcagctc txttctcagg ccgtaaqggc tgggggacig 

TGATCAGCAG ACGTCCTAGG AGGATCACAG GC7TGCTGCCC TGGGGGACGC 
CCCQGGAGGT GAGGAGAGCC AGGGACGCCT GGTOCTCTCC TACIGTCGGT 
CAGACGACCG AATIGTGTTC CIGAAGGGAA AGCTIOCCCC TCCGCGAOQG 
TlXGftCTCTT TTGCCTGCTT GTD3GAATAOG TOG,\CGGGTC ACGTGTGrCT 
GGATCTCTPG GTTTCIGTTTT TGTCTCTCXT GTO^TTGTCT 
ACAGTTTTAA TATGGGACAG ACGGTGACGA CCCCICTTAG TTTGACTCTC 
GACCATTGGA CTGAAGTTAA ATCCAGGGCT CATAATTTGT CAGTTCAGGT 
TAAGAAGGGA CXITTTGGCAGA CTTTCTGrTGT CTCTGAATGG CCGPCATICG 
ATGTFGGATG GO^TCAGAG GGGACCTTTA ATTCTGAGAT TATCCTG3CT 
GTTAAAGCAA TTATTT7TICA GACTOGACCC GGCTCTCATC CCGATCAGGA 
GCCCTATATC CTTACGTOGC AAGATTTOGC AGAGGATQCT CO30CATGGG 
TTAAACCATG GCTGAATAAG CCAAGAAAGC CAGCIOCCCG AATTCIGGCT 
CTTGGAGAGA AAAACAAACA CTCGGCIGAA AAAGTCAAGC CC7TCTCCTCA 
TATCTACCCC GAGATTGAGG AACCACCGGC TTCGCCGGAA QCCCAATCIG 
ttccoccao: COrrrATCTG GCACAG33TG CCGCGAGGGG acccittgcc 
CCTQCK3GAG CIGCGGGGGT GGAGQGACCT TCIGCAGGGA CTCGGAGCOG 
GAGGGGCGCC ACCCCGGAGC GGACAGACGA GATOGCGACA TTACOGCTGC 
GCACGTACXjG CCCTCCCACA CGGGGGGGCC AATIGCAGCC CCTCCAGTAT 
TOGCCCriTT CTTCTGCAGA TGTCFATAAT TGGAAAACTA ACCATCCCCC 
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TT1CIGGGAG GATCCCCAAC GCCTCACGGG GTCGGrTGGAG TCCCTTAlOr 
TCICICACCA GCCTACITOG GATGATTGIC AACAGCTGCT C3CftGACACTC 
TTCACAACCG AGGAGCGAGA GAGAATTCTA TTAGAG3CTA GAAAAAATGT 
TCCTOGGGCC GACGGGCGAC CC^OGOGGTr GCAAAATGAG ATIGACAT3G 
GATTTCCCTT AAOTXCCCX GGTIGGGACT ACAACAOGGC TGAAQGTAGG 
GAG^GCTTGA AAATCTATCG CCAGGCICIG GIGGCGGGIC T0CGGG3CGC 
CTCAAGACGG CCCACTAATT TGGCTAAGGT AAGAGAAGTG A^CPGGG^C 
CGAMGAACC COOCK^YT TTICTIGAGA GGCTCTTGGA AGCCTTCAGG 
CGCTACACXX: CTTITCATCC OOCTCAGftG GCCCAAAAAG CCICAGTGGC 
TXTCGCCTTT ATAGGACAGT C^3CCTTGGA TATTAGAAAG 7v^GCTICAGA 
GACIGGAAGG GTTACAGGAG GCTGAGTTAC GTGATCTAGT GAAGGMGCA 
GAGAAAGTAT ATTACAAAAG GGAGACAGAA i^jaGAAAGGG AACAAAGAAA 
AGAGAGAGAA AGAGAGGAAA GGGAGGAAAG ACGTAATAAA CGGCAAG?GA 
AGAATTIGAC TAAGATCTTG GCTGCAGIGG TIGA^GO^A AAGCAAT^GG 
GAAAGAGAGA GAGATITIM GAAAATTAGG TCAGGCCCTA GACAGICAGG 
GAACCIGGGC AATAGGACCC CACTCGACAA GGACCAATGT GCATATTGTA 
AAGAAAGAGG ACACTQGGCA AGGAACTGCC CCAAGAMGG AAACAAAGGA 
CCAAGGATCC TAQCTCIAGA AGAAGATAAA GATTAG3GGA GACGGGGITC 
GGACCCCCIC CCCG?GCCCA GQGTAACTIT GAAGGTGGAG GGGCAACCAG 
TIGAGTIGCT GGTFIGATACC GGAGCGAAAC ATTC^IGCT ACTACAGCCA 
TTAGGAAAAC TAAAAGATAA AAAATCCTGG GIGATGGGIG CACAGGGCAA 
CAACAGTATC CATGGACTAC CCGAAGACA3 TTGACITOGG AGHGGGAOGG 
GTAACCCACT OGTTTGIGGT CATACCTGAG TGCCCAGCAC OCCTCTTAGG 
TAGAGACTTA TIGACCAAGA TGQGAGCACA AATTICTTTT GAACAAGGGA 
AACCAGAAGr GTCIGCAAAT AACAAACCTA TCACIGTCTT GACCCTCCAA 
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TTAGATCACG AATATGGACT AT/CTCTCCC CTAGTAAAGC CTCATCAAAA 


5100 (SEQ ID NO: 1) 
cone ' d 


TATACAATIC TOGHGGAAC XJTITCCCCA AGCCIGGGCA GAAACCGCAG 


5150 


GGATGGGTTT GGCAAAGCAA GTTXXXCCAC AAGTTATTCA ACIGAMQOC 


5200 


AGIGCCACAC CAGTXrrCAGTT CAGACAGTAC CCCT7IGAGTA AAGAAGCTCA 


5250 


AGAA3GAATT cggccgcatc tccaa^gatt AATO^ACAG qqcatoctag 


5300 


TICCTCICCA ATCTCCCTGG AATACICCCC 'IGCTACCGGT TfcGAAAGOCT 


5350 


GGGACTAATG ACTATGGACC A3TTACAGGAC TTGAGAGAGG TCAATAAAOG 


5400 


CXnOCMGAT ATAGAGCCAA CAGTCCCGAA CCCTTATAPC CTXTTGrTCTIG 


5450 


CieiXXCACX: CXl^ACGGAGC TGCTATACAG TATTOGACTT AAAGGATOCC 


5500 


TICTIUIGCC TGAGATTAO\ CCCCPCTAGC CAftCX^CTTT TIGCCITOGA 


5550 


ATGGAGAGAT CCAGGTACGG GAAGAACCGG GCK3CTCACC TOGACCCGAC 


5600 


TCCCCCAAGG GTTCAAGAAC TCOCCGAQCA TLTi'iGSOGA AGCCCT^C&C 


5650 


AGAGACCIGG CC^ACTTCAG GATCCAACAC CCTX^GGIGA CCCICCTCCA 


5700 


GTACGTQGAT G^CCTGCITC TCGCGGGAGC CACCAAACAG GACTGCTTAG 


5750 


AAQGCACGAA CSGCACTKHG CIGGAATTCT CIGACCTAGG CTACPGAGCC 


5800 


TCTGCTAAGA AGGCCCAGAT TIGCAGGAGA GAGGTAACAT ACITGGGGTA 


5850 


CAGTTTADGG G-CGGGCAGC GATGGCTGAC OttGGCAGGG AAGAAAACIG 


5900 


T^OTCCAGAT ACCOGCCGCA ACCAGAGCCA AACAAATGAG AGA&jiTl/nG 


5950 


GCGACAGCTG GOTTIGCAG ACIGTGGATC CCGGGGTTIG CGACCTTAGC 


6000 


A3CCCCACIC TACCCGCTAA CCAAAGAAAA AGGGGAATTC TCCP3GGCIC 


6050 


CIGMCACCA GAAGGCATTT GATGCTATCA AAAAGGCQCT GCTGAGCGCA 


6100 


CCIGCICIGG CCCTCCCTCA CGTAACTAAA CCCITTACXX TTTATGIGGA 


6150 


TGAGCGTAAG GGAGTAGCCC GGGGAGTITr AACCCAAACC CTAGGACCAT 


6200 


GGAGAAGACC TGTGGCCTAC CTCTCAAAGA AGCIOGATCC TGTAGCCAGTT 


6250 


GC7TTGGOXA. TATGCCIGAA GGCTATCGCA GCiUlGGCCA TACTQGICAA 


6300 


FIGURE I, CONT. 
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GGACGCTGAC AAATTGACTT TGGGACAAGA ATATAACTGT AAlAUUU_vJL 
CAT3CATTQG AGAACATCGT TCGGCAGCCC CCAGALAJ^ai u^iba.uv\ 


6400 


cont 1 d 


CGCOOGCATG ACCCACTATC AAAGCCTGCT TLILALAb.'Vj m^^^k^j^x 


6450 




TOGCTCCACC AACCGCICTC AACCCTGCCA CIXTITLnJUL. l^AALaftLuftLi 






G^TCAACCAG TGACICATCA TTGCCATCAA CTATlUAl /MJUiAljAL- !Uj 






GGTCOGCAAG GACCTTACAG ACATACCGCT GAClUoAbAA biULlAA^Ll 






GI7TTCACT3A CGGAAGCAGC TATCTGGTOG AAGGTAAGAG C-AIUJ^IUUU 


DDjU 




G3GGCGGTGG TGGAOGGGAC COGCACGATC TCQGOCAG^A GCCTCCCJ3G 


D /UU 




A3GAACTTCA GCACAAAAGG CTGAGCTCAT GOXOTCACG C£A£?L I i iUG 


O / dU 




GGCIX3GCCGA AGGGAAATCC A.TAAACATTT ATACGGACAG C^GOTATGCC 


OO'Jb 




TITQOGACIG CACAOGTACA TG3GGCCATC TATAAACAAA UUJL>jllUL.i 


£T C C A 




TPOCTC^i3CA GGGAG3GAAA TAAPGAACAk AGAGGAAAI i LiAa^.LiAi 


conn 




TAGAAGCCGT ACATTTACGA AAAMGCTAG CTATTATACA CIGTCCDGGA 


Dj DU 




GMCAGAAAG CTAAAGATCT CATATCXAGA G3^AACXZPGA lUUClUAU^. 


7000 




GGTIGCCAAG C 7 ^GGCAGCCC AGQO 1\j 1 I iL illl i a l hl_t/w\ 


7050 




TGCXXAAAGC CCC^sGAACCXT AGACGACAI^ AU\LLLiiUjA /^jAL-il^^km 


7100 




GAGATAAAAA AGATAGACCA 1*1^1 uTGAGA GTCG32AAIJO t^ALl.ia.lA, 


71 ^0 

'lJv 




ACCTCAGATG GGAAGGAAAT CCTGCCCCAC AAAGAAGGGT TAGAATATCT' 


7200 




CCAACAAGAT ACATCGTCTA ACCCACC7TAG GAACTAAACA CCTX3CAGCAG 


7250 




TT03TICAGAA CATCCCCITA TCATCTTCTG AGGCTACGAG GAGTOGCIG^. 


7300 




CTIGGGTGGTC AAACATTGTG TGCCCTCCCA GCTGGTTAAT GCTAATCCTT 


7350 




CC^GAATCCC TCCAGGGA^G AGACTAAGQG GAAGCCAOGC AGGCGCTCAC 


7400 




TGGGAAGTIGG ACTTCACTGA GGTAAAGCXG GCTAAATATG GAAACAAATA 


7450 




CCTATIGGTT TTTGTAGACA CCTTTTCAGG ATOGGTAGAG GCI'IATCCIA 


7500 





CTAXjAAAGA GACTTCAACC GTGGTAGCTA AAAAAATACT GGA^GAAATT 7550 
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TTTCCAAGAT TIQGAATACC TAAGGTAATA GGGTCAGACA ATCGTCCAGC 
TlT l G T i GOC CAQGTAAGTC AGGGACTGQC CAAGATATTG GGGATTCATT 
GGAAACTGCA TTCTO2ATAC AGAC2CCAAA GCTCAGGACA GGTAGAGAGG 
ATGAATAGAA CCATTAAAGA GACCCTTACT AAATTGACCG CGGAGACTC3G 
CCTTAATGAT TGGATAGCTC TCXTIGOCCIT TGTOCTTTTT AGGCTITAGGA 
ACACXXCTGG ACAGTTTGGG CTCACCCCCT ATGAATTACT CTACGGGGGA 
OXOCCCCAT TGGTAGAAAT TGCTTCTGTA CATAGTGCTG ATG7GCT3CT 
TTCTTCTCTA GGCTGAA3GC ACTTGAGTCG GTGAGACAAC 
GA3CGTCGAG GCAACTCCGG GAGGCCTACT C^GGAGGAGG AGACITCCAG 
ATCCCACATX: GTTTCCAA3T GGGAGATTCA GICTACGTTA GACCOCACCG 
TGCAGGAAAC 



7600 (SEQ ID NO: 
cont 1 d 

7650 
7700 
7750 
7800 
7850 
7900 
7950 

80C0 

8050 

8060 
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10 20 30 40 so ^ 60 (SEQ ID NO: 2) 

ctacccctgc gtcgtgtacg actgtgggcc ccagcgcgct tccaataaaa atcctctigc 

70 80 90 100 HO 120 

* * * + * * * * 

TCITOOTC AftSACCGCIT CTTGTGAGTG ATTTGGGGTG TCGCCTCITC Q3AGCOCGGA 
130 140 150 ( 160 ^ 170 ^ 180 

CGAoSiGGGAT TCTTCTTTTA CTGCCCTTTC ATTIGGTGCG TTOGOCOOGA AATCCTGCGA 
190 200 210 ^ 220 ^ 230 ^ 240 

CCAC^CCTTA CACCCGAGAA CCGACTIGGA GGTAAAGGGA TCCCCTTTSG AACATATGTG 
250 260 270 280 ^ 290 ^ 300 

TGXGT03GCC GGCGTCTCTG TTCTGAGTGT CTCTTTTCGG TGATCCCCCC TTTCGGTITG 
310 320 33C 340 _ 350 ^ 360 

CAGCTCTCCT CTCAGACCGT AAGGACTGGA C5GACTGTGAT C^GCAGACGT GCTAGGAGGA 
370 380 390 400 410 420 

TCACAOOCTG CCACCCK3GG GGACGCCCCG GGAGGTGGGG AGAGCCAGGG ACOCCTGGTG 
430 440 450 _ 460 _ 470 ^ 480 

GXCTCCTACT GTCGGTCAGA GGACCGAGTT CTCTTGTTGA AGCGAAACCT TCCCCCTOCG 
490 500 510 520 530 540 

, * * * * * * 

CCOCCGTCCG ACTCTTTTGC CIGCTIGT03 aagacgcgga cgggtcgcgt ctctctcgat 
55C 560 570 _ 58C ^ ^0 ^ 600 

CTCTTGCTTT CTCTTTCGTC TCICTTTCTrC TTGIGCGTCC TrGTCTACAG TTTTAAT ATG 

MeC> 



610 



620 630 640 



GGA CAG ACA GTG ACT ACC CCC CTT AGT TTG ACT CTC GAC CAT TGG 

Gly Gin Thr Val Thr Thr Pro L^u Ser Leu Thr Leu Asp His Trp Thr> 

650 660 670 680 690 

GAA GTT AGA TCC AGG GCT CAT AAT TTG TCA GTT CAG GTT AAG AAG GGA 
Glu Val Arg Ser Arg Ala His Asn Leu Ser Val Gin Val Lys Lys Gly> 

700 710 720 730 ^40 

* ***** 

-CT TO CAG ACT TTC TGT GCC TCT GAA TGG CCA ACA TTC GAT GTT GGA 
Pro Trp Gin Thr Phe Cys Ala Ser Civ. Trp Pro Thr Phe Asp Val Giy> 
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750 760 770 ^ 780 ^ 790 (S£Q ^ ^tq. 2) 

Trp Pro Ser Glu Gly Thr Phe Asn Ser Giu lie lie Leu Ala Val Lys> 
800 810 820 630 ^ 8.0 

*■ * * * * * * 

OCA ATC ATT TIT CAG ACT GGA CCC GGC TCT CAT CCT GAT CAG GAG CCC 
S S£ He Phe Gin Thr Gly Pro Gly Ser His Pro Asp Gin Glu Pro 



850 



860 870 880 



TAT ATC CTT ACG TGG CAA GAT TTG GCA GAA GAT CCT CCG >rvo Val> 

Tyr lie Leu Thr Trp Gin Asp Leu Ala Glu Asp Pro Pro Pro Trp Val> 

890 900 ^ 910 ^ 920 ^ 93C 

^CCATOCrl^T^caAffiAMCaGGTCDrCGAATCOTGrr 
Lys Pro Trp Leu Asn Lys Pro Arg Lys Pro Gly Pro Arg lie Leu Ala> 

940 950 960 ^ 970 ^ 980 

CTT GGA GAG AAA AAC AAA CAC TCG GCC GAA AAA GTC GAG CCC TCT 

Leu Gly Glu Lys Asn Lys His Ser Ala Glu Lys Val Glu Pro Ser Pro 



990 1000 1010 



1020 1030 



^ATCTACC^GMATCGMGBGCOGCCGACTTOOTG^CKOA 
Arg lie Tyr Pro Glu lie Glu Glu Pro Pro Thr Trp Pro Glu Pro Gln> 

1040 1050 1060 1070 1080 



CCT GTT CCC CCA CCC CCT TAT CCA GCA CAG GGT GCT GTG AGG GGA 

P-o Val Pro Pro Pro Pro Tyr Pro Ala Gin Gly Ala Val Arg Gly Pro> 



1090 



1100 1110 H20 



TCTTOCCT^a^GCTCCGKGC^CKCSaCCrGCTGaCGGS^r 
Ser Ala Pro Pro Gly Ala Pro Val Val Glu Gly Pro Ala Ala Gly Thr> 

1130 1140 1150 ^ H60 ^ 1"0 

COG AGC CGG AGA GGC GCC ACC CCG GfC CGG ACA GAC GAG ATC GOG ATA 
Arg Ser Arg Arg Gly Ala Thr Pro Glu Arg Thr Asp Glu He Ala Ile> 



1180 



1190 1200 1210 1220 



TTA CCG CTG CGC ACC TAT GGC CCT CCC ATG CCA GGG GGC CAA TTG CAG 
Leu Pro Leu Arg Thr Tyr Gly Pro Pro Met Pro Gly Gly GLn Leu GH» 

1230 1240 1250 1260 ^ 1270 

CCC CTC CAG TAT TGG CCC TTT TCT TCT GCA GAT CTC TAT A^lT TGG AAA 
Pro Leu Gin Tyr Trp Pro Phe Ser Ser Ala Asp Leu Tyr Asn Trp Lys> 

1280 1290 1300 ^ 1310 ^ 1320 

ACT AAC CAT CCC CCT TTC TCG GAG GAT CCC CAA CCC CTC ACG GOG TTG 
Thr Asn His Pro Pro Phe Ser Glu Asp Pro Gin Arq Leu Thr Gly Leu> 
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1330 1340 13S0 1360 JD N q. ?) 

cont'd 

GTG GAG TCC CTT ATG TIC TCT CAC CAG CCT ACT TGG GAT GAT TGT CAA 
Val Glu Ser Leu Met Phe Ser His Gin Pro Thr Trp Asp Asp Cy* Gln> 



1370 1380 



X390 1400 1410 



CAG CTG CTG CAG ACA CTC TTC ACA ACC GAG GAG CGA GAG AGA ATT CTG 
Gin Leu Leu Gin Thr Leu Phe Thr Thr Glu Glu Arg Glu Arg lie Leu> 

1420 1430 1440 1450 ^ 1460 

TTA GAG GCT AAA AAA AAT GTT CCT GGG GCC GAC GGG CGA CCC ACG CAG 
Leu Glu Ala Lys Lys Asn Val Pro Gly Ala Asp Gly Arg Pro Thr Gln> 



1470 1480 



1490 1500 1510 



TTG CAA AAT GAG ATT GAC ATG GGA TTT CCC TTG ACT CGC CCC GCT TGG 
Leu Cln Asn Glu lie Asp Met Gly Phe Pro Leu Thr Arg Pro Gly Trp> 

1520 1530 1540 1550 1560 

. * + * 

GAC TAC AAC ACG OCT GAA GGT AGG GAG AGC TTG AAA ATC TAT CGC CAG 
Asp iyr Asn Thr Ala Glu Gly Arg Glu Ser Leu Lys lie Tyr Arg Gin> 



1570 



1580 1590 1600 



GCT CTG GTG GCG GGT CTC CGG GGC GCC TCA AGA CGG CCC ACT AAT TTG 
Ala Leu Val Ala Gly Leu Arg Cly Ala Ser Arg Arc Pro Thr Asn Leu> 

1610 1620 1630 1640 1650 

* * * * 

GCT AAG CPA AGA GAG GTG ATG CAG GGA CCG AAC GAA CCT CCC TCG GTA 
Ala Lys Val Arg Glu Val Met Gin Gly Pro Asn Glu Pro Pro Ser Val> 

1560 1670 1680 1690 1700 

ttt CTT GAG AGG CTC ATG GAA GCC TTC AGG CGG TTC ACC CCT TTT GAT 
Phe Leu Glu Arg Leu Met Glu Ala Phe Arg Arg Phe Thr Pro Phe Asp> 



1710 



172 0 1730 1740 1750 



CCT ACC TCA GAG GCC CAG AAA GCC TCA GTG GCC CTG GCC TTC ATT GGG 
Pro Thr Ser Glu Ala Gin Lys Ala Ser Val Ala Leu Ala Phe lie Gly> 

1760 1770 1780 1790 18C0 

* * * * 

CAG TCG GCT CTG GAT ATC AGG AAG AAA CTT CAG AGA CTG GAA GGG TTA 
Gin Ser Ala Leu Asp lie Arg Lys Lys Leu Gin Arg Leu Glu Gly Leu> 



1810 



1320 1830 1840 



CAG GAG GCT GAG TTA CGT GAT CTA GTG AGA GAG GCA GAG AAC GTG TAT 
Gin Glu .Ala Glu Leu Arg Asp Leu Val Arg Glu Ala Glu "^ys Val Tyr> 



1850 



I860 1870 1880 1890 



-AC AGA AGG GAG ACA GAA GAG GAG AAG GAA CAG AGA AAA GAA AAG GAG 
Tyr Arg Arg Glu Thr Glu Glu Glu Lys Glu Gin Arg Lys Glu Lys Glu> 
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1900 1910 1920 1930 ^ 1940 (SE Q ID N q. 2) 

cone ' d 

AGA GAA GAA AGG GAG GAA AGA CGT GAT AGA CGG CAA GAG AAG AAT TTG 
Arg Glu Glu Arg Glu Glu Arg Arg Asp Arg Arg Gin Glu Lys Asn Leu> 

1950 1960 1970 1980 1990 



ACT AAG ATC TTG GCC GCA GTG GTT GAA GGG AAG AGO AGC AGG GAG AGA 
Thr Lys lie Leu Ala Ala Val Val Glu Gly Lys Ser Ser Arg Glu Arcp 

2000 2010 2020 2030 2040 



GAG AGA GAT TTT AGG AAA ATT AGG TCA QGC CCT AGA CAG TCA GGG AAC 
Glu Arg Asp Phe Arg Lys lie Arg Ser Gly Pro Arg Gin Ser Gly Asn> 

2050 2060 2070 2080 

CTG GGC AAT AGG ACC CCA CTC GAC AAG GAC CAG TGT GCG TAT TOT AAA 
Leu Gly Asn Arg Thr Pro Leu Asp Lys Asp Gin Cys Ala Tyr Cys Lys> 



2090 



2100 2110 2120 2130 



GAA AAA GGA CAC TGG GCA AGG AAC TGC CCC AAG AAG GGA AAC AAA GGA 
Glu Lys Gly His Txp Ala /org Asn Cys Pro Lys Lys Gly Asn Lys Gly> 

2140 2150 2160 2170 2180 

CCG AAG GTC CTA GCT CTA GAA GAA GAT AAA GAT T AGGGGAGACG 
Pro Lys Val Leu Ala Leu Glu Glu Asp Lys Asp> 

2190 2200 2210 2220 2230 2240 

GGGTTCGGAC CCCCTCCCCG AGCCCAGGGT AACTTTGAAG GTGGAGGGGC AACX1AGTTGA 

2250 2260 2270 2280 2290 2300 

+ * * * * * * 

GTTCCTGGTT GATACCGGAG CGGAGCATTC ACTGCTGCTA CAACCATTAG GAAAACTAAA 

2310 2320 2330 2340 2350 

* * * 

AGAAAAAAAA TCCTGGGTG ATG GGT GCC ACA GGG GAA CGG CAG TAT CCA TGG 
Met Gly Ala Thr Gly Gin Arg Gin Tyr Pro Trp> 

2360 2370 2380 2390 2400 

ACT ACC CGA AGA ACC GTT GAC TTG GGA GTG GGA CGG CTA ACC CAC TOG 
Thr Thr Arg Arg Thr Val Asp Leu Gly Val Gly Arc Val Thr His Ser> 

2410 2420 2430 2440 

****** * * * 

TTT CTG GTC ATC CCT GAG TGC CCA GTA CCC CTT CTA GGT AGA GAC TTA 
Phe Leu Val lie Pro Glu Cys Pro Val Pro Leu Leu Gly Arg Asp Leu> 



2450 



2460 2470 2460 2490 



CTG ACC AAG ATG GGA GCT CAA ATT TCT TTT GAA CAA GGA ACA CCA GAA 
Leu Thr Lys Met Gly Ala Gin He Ser Phe Glu Gin Gly Arg Pro Glu> 
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2500 2510 2520 2530 2540 ( S£ q j D N q- 2 ) 

* * * cont'd 

GTC TCT GTG AAT AAC AAA CCC ATC ACT GTG TTG ACC CTC CAA TTA GAT 
Val Ser Val Asn Asn Lys Pro lie Thr Val Leu Thr Leu Gin Leu Asp> 

2550 2560 2570 2580 2590 

********* * T 

GAT GAA TAT CGA CTA TAT TCT CCC CAA GTA AAG CCT GAT CAA GAT ATA 
Asp Glu Tyr Arg Leu Tyr Ser Pre Gin Val Lys Pro Asp Gin Asp Ile> 

2600 2610 2620 2630 2640 

********* 

CAG TCC TGG TTG GAG CAG TTT CCC CAA GCC TOG GCA GAA ACC GCA GGG 
Gin Ser Trp Leu Glu Gin Phe Pro Gin Ala Trp Ala Glu Thr Ala Gly> 

2650 2560 2670 2680 

* »* + + * 

ATG GOT TTG GCA AAG CAA GTT CCC CCA CAG GTT ATT CAA CTG AAG GCC 
Met Gly Leu Ala Lys Gin Val Pro Pro Gin Val lie Gin Leu Lys Ala> 

2690 2700 2710 2720 2730 

AGT GOT ACA CCA GTA TCA GTC AGA CAG TAC CCC TTG ACT AGA GAG OCT 
Ser Ala Thr Pro Val Ser Val Arg Gin Tyr Pro Leu Ser Arg Glu Ala> 

2740 2750 2760 2770 2780 

******** 

CGA GAA GGA ATT TGG CCG CAT GTT CAA AGA TTA ATC CAA CAG GGC A'iC 
Arg Glu Gly lie Trp Pro His Val Gin Arg Leu He Gin Gin Gly Ile> 

2790 2800 2810 2820 2830 

CTA GTT CCT GTC CAA TCC CCT TGG AAT ACT CCC CTG CTA CCG GTT AGG 
Leu Val Pro Val Gin Ser Pro Trp Asn Thr Pro Leu Leu Pro Val Arg> 

2040 2850 2860 2870 2880 

AAG CCT GGG ACC AAT GAT TAT CGA CCA GTA CAG GAC TTCTaGA GAG GTC 
Lys Pro Gly Thr Asn Asp Tyr Arg Pro Val Gin Asp Leu Arg Glu Val> 

2890 2900 2910 2920 

AAT AAA AGG GTG CAG GAC ATA CAC CCA ACG GTC CCG AAC CCT TAT AAC 
Asn Lys Arg Val Gin Asp He His Pro Thr Val Pro Asn Pro Tyr Asn> 

2930 2940 2950 2960 2970 

CTC TTG AGC GCC CTC CCG CCT GAA CGG AAC TGG TAC ACA GTA TTG GAC 
Leu Leu Ser Ala Leu Pro Pro Glu Arg Asn Trp Tyr Thr Val Leu Asp> 

2980 2990 3000 3010 3020 

**« w* T *r * * 

TTA AAA GAT GCC TTC TTC TGC CTG AGA TTA CAC CCC ACT AGC CAA CCA 
Leu Lys Asp Ala Phe Phe Cys Leu Arg Leu His Pro Thr- Ser Gin Pro> 

3030 3040 3050 3060 3070 

* * * * « **•*•* * 

CTT TTT ACC TTC GAA TGG AGA GAT CCA GGT ACG GGA AGA ACC GGG CAG 
Leu Phe Tnr Phe Glu Trp Arg Asp Pro Gly Thr Gly Arg Thr Gly Gln> 
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3080 3090 3100 3110 3120 (SLQ ID NO: 2) 

cdnt ' d 

CTC ACC TOG ACC CGA CTG CCC CAA QGG TIC AAG AAC TCC CCG ACC ATC 
Leu Thr Trp Thr Arg Leu Pro Gin Gly Phe Lys Asn Ser Pro Thr Ile> 

3130 3140 3150 3160 

* * * * * * * * 

TTT GAC GAA GCC CTA CAC AGG GAC CTG GCC AAC TTC AGG ATC CAA CAC 
Phe Asp Glu Ala Leu His Arg Asp Leu Ala Asn Phe Arg lie Gin His> 

3170 3180 3190 3200 3210 

CCT CAG CTG ACC CTC CTC CAG TAG GTG GAT GAC CTC CTT CTC GCG GGA 
Pro Gin Val Thr Leu Lou Gin Tyr Val Asp Asp Leu Lou Leu Ala Gly> 

3220 3230 3240 3250 3260 

* * * * * + * -» * 

GCC ACC AAA CAG GAC TGC TTA GAA GGT ACG AAG GCA CTA CTG CTG GAA 
Ala Thr Lys Gin Asp Cys Leu Glu Gly Thr Lys Ala Leu Leu Leu Glu> 

3270 3280 329C 3300 3310 

* * * * * ' * * * 

TTG TCT GAC CTA GGC TAC AGA GCC TOT GOT AAG AAG GCC CAG ATT TGC 
Leu Ser Asp Leu Gly Tyr Arg Ala Ser Ala Lys Lys Ala Gin Tie Cys> 

3320 3330 3340 3350 336C 

* * * * ****** 

ACG AGA GAG GTA ACA TAC TIG COG TAC TTG CGG GGC GGG CAG CGA 

Arg Arg Glu Val Thr Tyr Leu Gly Tyr Ser Leu Arg Gly Gly Gin Arg> 

3370 3380 3390 3400 

r * * * * ' * * 

TGC CTG ACG GAG GCA CGG AAG AAA ACT GTA CTC CAG ATA COG GCC CCA 
Trp Leu Thr Glu Ala Arg Lys Lys Thr Val Val Gin lie Pro Ala Pro> 

3410 3420 3430 3410 3450 

* * * * * * » + * 

ACC ACA GCC AAA CAA GTG AGA GAG TTT TTG GGG ACA GCT GGA TTT TGC 
Thr Thr Ala Lys Gin Val Arg Glu Phe Leu Gly Thr Ala Gly Phe Cys> 

3460 3470 3480 3490 3500 

♦ * * * * * * * * 

AGA CTG TGG ATC CCG GGG TTT GCG ACC TTA GCA GCC CCA CTC TAC CCG 
Arg Leu Trp lie Pro Gly Phe Ala Thr Leu Ala Ala Pro Leu Tyr Pro> 

3510 3520 3530 3540 3550 

» * * * * * * . .r * 

CTA ACC AAA GAA AAA GGG GGT TGC TTA CCT CAG CAG GGA GGG AAA TA AAG 
Leu Thr Lys Glu Lys Gly 

Lys Arg Gly Leu Leu Thr Ser Ala Gly Arg Glu lie Lys> 

3S60 3570 3580 3590 3600 

***** ***** 

AAC AAA GAG GAA ATT CTA AGC CTA TTA GAA GCC TTA CAT TTG CCA AAA 
Asn Lys Glu Glu lie Leu Ser Leu Leu Glu Ala Leu His Leu Pro Lys> 

3610 3620 3630 3640 3650 

* * * * • ** * «» 

AGG CTA GCT ATT ATA CAC TGT CCT GGA CAT CAG AAA GCC AAA GAT CTC 
Arg I/eu Ala He He His Cys Pro Gly His Gin Lys Ala Lys Asp Leu> 
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3660 -1670 3680 ^ 3690 ^ (SEQIDNO:2) 

* ***** cont ' cl 

ATA TOT AGA GOG AAC CAG ATG GCT GAC cog gtt gcc aag cag gca 
lie Ser Arg Gly Asn Gin Met Ma Asp Arg Val -Ma Lys Gin Ala Ala> 

3700 3710 ^ 3720 ^ 3730 ^ 3740 

CAG GCT GTT AAC CTT CTG CCT ATA ATA GAA AOG CCC AAA GCC CCA 

Gin Ala Val Asn Leu Leu Pro He He Glu Thr Pro Lys Ala Pro Glu> 

3750 3760 ^ 3770 _ 3780 ^ 3790 

CCC AGA Col CAG TAG ACC CTA GAA GAC TGG CAA GAG ATA AAA AAG ATA 
Pro Arg Arg Gin Tyr Thr Leu Glu Asp Trp Gin Glu lie Lys Lys Ue> 

3800 3810 ^ 3820 ^ 3830 ^ 38,0 

GAC CAG TTC TCT GAG ACT CCG GAG GGG ACC TGC TAT ACC TCA TAT GCG 
A^p Gin Phe Ser Glu Thr Pro Glu Gly Thr Cys Tyr ^ Ser Tyr Gly> 

3850 3860 ^ 3870 _ 3830 _ 3890 

AAG GAA ATC CTG CCC CAC AAA GAA GGG TTA GAA TAT GTC CAA CAG ATA 
Lys Glu He Leu Pro His Lys Glu Gly Lou Glu Tyr Val G-n Gin Ile> 

3900 3910 ^ 3920 ^ 3930 

CAT CGT CTA ACC CAC CTA GGA ACT AAA CAC CTG CAG CAG TTG GTC AGA 
His Arg Leu Thr Kis Leu Gly Tnr Lys Kis Leu Gin Gin Leu Val Arg. 

3940 3950 3960 3970 3980 

ACA TCC CCT TAT CAT GTT CTG AGG CTA CCA GGA GTG GCT GAC TCC GTG 
•Kir Ser Pro Tyr His Val Leu Arg Leu Pro Gly Val Ala Asp ^er Va.> 

3990 4000 401C ^ 4020 _ 4030 

GTC AAA CAT TGT GTG CCC TGC CAG CTG GTT AAT GCT !*TcCT TCC AGA 
Val Lys His Cys Val Pro Cys Gin Leu Val Asn Ala Asn Pro Ser Arg> 

4040 4050 4060 ^ 4C70 _ 4080 

ATA CCT CCA GGA AAG AGA CIA AGG GGA AGC CAC CCA GGC GCT CAC TOG 
He Pro Pro Gly Lys Arg Leu Arg Gly Ser His Pro Gly Ala his Trp> 

4090 4100 ^ 4110 _ 4120 ^ 4130 

GAA GTG GAC TTC ACT GAG GTA AAG CCG GCT AAA TAC GGA AAC AAA TAT 
Glu Val Asp Phe Tnr Glu Val Lys Pro Ala Lys Tyr Gly Asn Lys Tyr> 



4140 



42.50 41C0 4170 



CTA TTG GTT TTT GTA GAC ACC TTT TCA GGA TGG GTA GAG GCT TAT CCT 
Leu Leu Val Phe Val Asp Thr Phe Ser Gly Trp Val Glu Ala Tyr Pro> 

4180 4190 4200 ^ 4210 _ 4220 

ACT AAA AAA GAG ACT TCA ACC GTG GTG GCT AAG A&A ATA CTG GAG GAA 
Thr Lys Lys Glu Thr Ser Thr Val Val Ala Lys Lys lie Lou Glu Glu> 
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4230 4240 4250 4260 4270 (SEQ ID NO: 2) 

cont 1 d 

ATT TTT CCA AGA TTT GGA ATA CCT AAG GTA ATA GGG TCA GAC AAT GGT 
He Phe Pro Arg Phe Gly He Pro Lys Val lie Gly Ser Asp Asn Gly> 

4280 4290 4300 4310 4320 

CCA GCT TTC GTT GCC CAG GTA ACT CAG GGA CTG GCC AAG ATA TTG GGG 
Pro Ala Phe Val Ala Gin Val Ser Gin Gly Leu Ala Lys He Lea Gly> 

4330 4340 4350 4360 4370 ^ 4380 

***** ****** 
ATT GAT TG A AAA CTG CAT TGT GCA TAC AGA CCC CAA AGC TCA GGA CAG 
He Asp Lys Leu His Cys Ala Tyr Arg Pro Gin Ser Ser Gly Gln> 

4380 4390 4400 4410 

+ * * * 

GTA GAG AGG ATG AAT AGA ACC ATT AAA GAG ACC CTT ACC A*A TTG ACC 
Val Glu Arg Met Asn Arg Thr He Lys Glu Thr Leu Tax Lys Leu Thr> 

4420 4430 4440 4450 4460 

ACA GAG ACT GGC ATT AAT GAT TGG ATG GCT CTC CTG CCC TTT GTG CTT 
Trir Glu Thr Gly lie Asn Asp Trp Met: Ala Leu Leu Pro Phe Val Leu> 

4470 4480 4490 4500 ^ 4510 

rrTT T AGG GTG AGG AAC ACC CCT GGA CAG TTT GGG CTG ACC CCC TAT AAA 
Phe Arg Val Arg Asn Thr Pro Gly Gin Phe Gly Leu Thr Pro Tyr Lys> 

4520 4530 4540 4550 4560 

TTG CTC TAC GGG GCA. CCC CCC CCG TTG GCA GAA ATT GCC TTT GCA CAT 
U2u Leu Tyr Gly Gly Pro Pro Pro Leu Ala Glu He Ala Phe Ala His> 

4570 4580 4590 4600 4610 

* — - * 

AGT GCT GAT GTG CTG CTT TCC CAG CCT TTG TTC TCT AGG CTC AAG GCG 
Ser Ala Asp Val Leu Leu Scr Gin Pro Leu Phe Ser Arg I,eu Lys Ala> 

4620 4630 4640 4650 

CTC GAG TGG GTG AGG CAG CGA GCG TGG AAG CAG CTC COG GAG GCC TAC 
Leu Giu Trp Val Arg Gin Arg Ala Trp Lys Gin Leu Arg Glu Ala Tyr> 

4660 4670 4680 4590 4700 

* ** * * * 

TCA GGA GGA GAC TTG CAA GTT CCA CAT CGC TTC CAA GTT GGA GAT TCA 
Ser Gly Gly Asp Leu Gin Val Pro His Arg Phe Gin Val Gly Asp Ser> 



4710 



4720 4730 4740 4750 



* * * - 

GTC TAT GTT AGA CGC CAC CGT GCA GGA AAC CTC GAG ACT CGG TAG AAG 
Val Tyr Val Arg Arg His Arg Ala Gly Asn Leu GLu Thr Arg Lys> 

476G 4770 4700 4790 4800 

GGA CCT TAT CTC GTA CTT TTG ACC ACA CCA ACG GCT GTG AAA GTC GAA 
G*ly Pro Tyr l^u Val Leu Leu Thr Thr Pro Thr Ala Val Lys Val Glu> 
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GGA ATC CCC TTA AGC TTC GCC TCC ATC GCG TGG TIC CTT ACT CTG TCA 
Gly lie Pro Leu Ser Phe Ala Ser He Ala Trp Phe Leu Thr i.eu Ser> 



4860 



4870 4f80 4890 



ATA ACT CCT CAA GTT AAT GGT AAA CGC CTT GTG GAC AGC CCG AAC TCC 
He Thr Pro Gin Val Asn Gly Lys Arg Leu Val Asp Ser Pro Asn Ser> 

4900 4910 4920 4930 4940 

* * * ** * ** 

CAT AAA CCC TTA TCT CTC ACC TGG TTA CTT ACT GAC TCC GGT ACA OGT 
His Lys Pro Leu Ser Leu Thr Trp Leu Leu Thr Asp Ser Gly Thr Gly> 

4950 4960 4970 4980 4990 

* * * * * 

* « * * * 

ATT AAT ATT AAC AGC ACT CAA GGG GAG GCT CCC TTG GGG ACC TGG TGG 
He Asn lie Asn Ser Thr Gin Gly Glu Ala Pro Leu Gly Thr Trp Trp> 

5000 5010 5020 5030 5040 

CCT GAA TTA TAT GTC TGC CTT CGA TCA GTA ATC CCT GGT CTC AAT GAC 
Pro Glu Leu Tyr Val Cys Leu Arg Ser Val lie Pro Gly Leu Asn Asp> 

5050 5060 5070 5080 5090 

* ■* * * * " lr 

CAG GCC ACA CCC CCC GAT GTA" CTC CCT GCT TAC GGG TIT TAC (TTT TGC 
Gin Ala Thr Pro Pro Asp Val Leu Arg Ala Tyr Gly Phe Tyr Val Cys> 



510Q 



5110 5120 5130 



CCA G^CTCCA \AT AAT GAA GAA TAT TGT GGA AAT CCT CAG GAT TTC 
Pro Gly Pro Pro Asn Asn Glu Giu Tyr Cys Gly Asn Pro Gin Asp Phe> 

51.40 5150 5160 ^ 51*70 ^ _5180 

TTT TGC AAG CAA TGG AGC TGC ATA ACT TCT AAT GAT GGG AAT TGG AAA 
Phe Cys Lys Gin Trp Ser Cys He Thr Ser Asn Asp Gly Asn Trp Lys> 



5190 



5200 5210 5220 5230 



TGG CCA GTC TCT CAG CAA GAC AGA GTA AGT TAC TCT TTT GTT AAC AAT 
Trp Pro Val Ser Gin Gin Asp Arg Val Ser Tyr Ser Phe Val Asn Asn> 



5240 5250 



5260 5270 5280 



CCT ACC AGT TAT AAT CAA TTT AAT TAT GGC CAT GGG AGA TGG AAA GAT 
Pro Thr Ser Tyr Asn Gin Phe Asn Tyr Gly His Gly Arg Trp Lys Asp> 

5290 5300 5310 5320 5330 

TGG CAA CAG CGG GTA CAA AAA GAT GTA CGA AAT AAG CAA ATA AGC TGT 
Trp Gin Gin Arg Val Gin Lys Asp Val Arg Asn Lys Gin He Ser Cys> 

5340 5350 5360 5370 

CAT TCG TTA GAC CTA GAT TAC TTA AAA ATA AGT TTC ACT GAA AAA GGA 
His Ser Leu Asp Leu Asp Tyr Leu Lys He Ser Phe Thr Glu Lys Gly> 
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(SEQIDNO:2) 

5380 5390 5400 5410 5420 ^ COUC ' d 

_ ,ir Tnr GTA AAT GGT ATA TCT TGG GGA ATA 

AAA CAA GAA AAT ATT CAA AAG TGG GTA AAT C*,l 

Lys Glr. Glu Asn He Gin Lys Trp Val Asa Gly He Ser Trp Gly Ile> 

5430 5440 5450 ^ 5460 ^ 5470 

GTG TAC TAT GGA CGC TCP GGG AGA AAG AAA GGA TCT GTT CTG ACT KTT 
^ T^ Tyr Sy Gly Ser Gly Arg Lys Lys Gly Ser val Leu Tlu- He* 

S480 5490 5500 5510 5520 

„ * * * * 

CGC CTC AGA ATA GAA ACT CAG ATG GAA OCT OCG GTT GOT ATA GGA CCA 
Arg Leu Arg lie Glu Thr Gin Met Glu Pro Pro Val Ala He Gly Pro 



5530 



5540 5550 5560 



AAT AAG GGT TTG GCC GAA CAA GGA CCT CCA ATC CAA GAA CAG 
Asn Lys Gly Leu Ale Glu Gin Gly Pro Pro He Gin Glu cto 

5570 5580 5590 ^ 5600 ^ 5610 

AGG CCA TCT CCT AAC CCC TCT GAT TAC AAT ACA ACC TCT GGA TCA < ^ < ~ 
Arg Pro Ser Pro Asn Pro Ser Asp Tyr Asn Thr Thr Ser Gly Ser Val> 



5620 5630 



5640 5650 5660 



CCC AC- GAG CCT AAC ATC ACT ATT AAA ACA GGG GCC AAA CTT TIT AGC 
So Thr Glu Pro Asn He Thr He Lys Thr Gly Ala Lys Leu Phe Ser"> 

5670 5680 5690 5700 

CTC ATC CPG GGA GCT TTT C^A GCT CTT AAC TCC AGG ACT CCA GAS GCT 
Leu He Gin Gly Ala Phe Gin Ala Leu Asn Ser Thr Thr Pro Glu Ala> 

571C 5720 5730 ^ 5740 ^ 5750 

,CC T»~T TCT TGT TGG CTT TGC TTA GCT TCG GGC CCA CCT _ TAC TAT GAG 
Thr Ser Ser Cys Trp Leu Cys Leu Ala Ser Gly Pro Pro Tyr Tyr Glu> 

5760 5770 5780 5790 5800 

* * * * * 

GGA ATG GCT AGA GGA GGG AAA TTC A'VT GTG ACA AAG GAA CAT AGA 

Gly Met Ala Arg Gly Gly Lys Phe Asn Val Thr Lys Glu His Arg Asp> 

5810 5820 5830 ^ 5840 ^ 5850 

CAA TGT ACA TGG GGA TCC CAA AAT AAG CTT ACC CTT 

Gin Cys Thr Trp Gly Ser Gin Asn Lys Leu Thr Leu Thr Glu Val Ser> 
5860 5870 5880 5890 5900 

* * * * 

GGA AAA GGC ACC TGC ATA GGG ATG GTT CCC CCA TCC CAC CAA CAC CTT 
Gly Lys Gly Thr Cys He Gly Me-. Val Pro Pro Ser His Gin His L€U> 



5910 5920 



5930 5940 



TGT AAC CAC ACT GAA GCC TTT AAT CGA ACC TCT GAG AGT CAA TAT CTG 
Cys Asr. His Thr Glu Ala Phe Asn Arg Thr Ser Glu Ser Gin Tyr Leu> 
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5950 5960 5970 ^ 5980 _ 5990 ^ cont'd 

Val Pro Gly Tyr Asp Arg Trp Trp Ala Cys Asn i.-i y 

6000 6010 ^ 6020 ^ 6030 ^ 6040 

TGT GTT TCC ACC TIC GTT TTC AAC CAA AC? AAA GAC TTT TGC GTT ATG 
Sf t£ 2 Val Phe Asn G!» Thr Lys Asp Phe Cys Val MeO 

6050 6060 6070 6080 ^ 6090 

_„ TAr T AT CCC GAA AAA GCA GTC CTT 

CTC CAA ATT GTC CCC CGG GTG TAC TAC 1AI UJ- ^ 

Val Gin lie Val Pro Arg Val Tyr Tyr Tyr Pro Glu Lys Ala Val Leu> 
6110 ^ 6120 ^ 6130 ^ 6140 

CAT GAA TAT GAC TAT AGA TAT AAT CGG CCA AAA AGA GAG CCC ATA TCC 
Asp Glu Tyr Asp Tyr Arg Tyr Asn Arg Pro Lys Arg Gla Pro lie Ser. 

6150 6160 ^ 6170 _ 6180 

CTG ACA CTA GCT CTA ATG CTC GGA TTG GGA GTG GCT GCA CCC GTC GCA 
Lou Thr Leu Ala Val Mec Leu Gly Leu Gly Val Ala Ala Gly w. Cry- 

6190 6200 ^ 6210 > 622C _ 6230 

-i~a ra^A ATC GCT GCC CTA ATC ACA GGA COG CAA CAG CTG GAG AAA GGA 
«CA GGA ACG GCT GCC C.A " ^ ^ Glu Lys Gly> 

Thr Gly Thr Ala Ala Leu lie Thr Giy --o ui 

6240 6250 ^ 6260 _ 6270 ^ 6280 

CTT AGT AAC CTA CAT CGA ATT GTA ACG GAA GAT CTC CAA GCC CTA CAA 
Leu Ser Asn Leu His Arg He Val Thr Glu Asp Leu Glr. Leu olu> 



6290 



6300 6310 6320 6330 



* ■* * 

rppc C^A ACC TCC TTA~TCT GAA GTG 
AAA TCT GTC AGT AAC CTG GAG GAA TCC C.^ al, 

Lys Ser Val Ser Asn Leu Glu Glu Se. ^on 

6340 ^ 6350 ^ 6360 ^ 6370 ^ 6380 

GTT CTA CAG AAC AGA AGG GGG TTA GAT CTG TTA CTA Glv> 

Val Leu Gin Asn Arg Arg Gly Leu Asp Leu Leu Phe Leu Lys Glu Gly> 



6390 



6400 6410 6420 



GGG TTA TGT GTA GCC TTA AAA GAG GAA TGC TGC TTC asd H^«~> 

Gly Leu Cys Val Ala Leu Lys Glu Glu Cys Cys Phe Tyr Vax Asp 

6430 6440 _ 6450 ^ 6460 ^ 6470 

TCA GGA GCC ATC AGA GAC TCC ATG AGC AAG CTT AGA GAA AGG TTA GAG 
Ser Gly Ala lie Arg Asp Ser Met Ser Lys Leu Arg Glu Arg Leu Glu> 

6480 6490 6500 ^ 6510 ^ 6520 

AGG CGT CGA AGG GAA AGA GAG GCT GAC CAG GGG TOG TTT GAA GGA TGG 
S U Arg Arg Glu Arg Glu Ala Asp Gin Gly Trp Phe Glu Gly Trp> 
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6540 ^ 6550 ^ 6560 ^ 6570 (SEQ ID NO: 2) 

Phe Asn Arg Ser Pro Trp Met The Thr Leu L*u Ser Ale ueu Thr Gly> 
6580 6590 ^ 6600 ^ 6610 ^ 662C 

rrr CT\ CTA GTC CTG CTC CTG TTA CTT ACA GTT GOG CCT TGC TTA ATT 
S S£ SSSSl«l«>** Val GXy Pro Cys Leu He> 



6630 



6640 6650 6660 



AAT AGG TTT GTT GCC TTT GTT AGA GAA CGA ^f^" Va^ Gill lle> 

Asn Arg Phe val Ala Phe Val Arg Glu Arc Val Ser Al a Jal Gin Ue> 

6670 6680 6690 ^ 6700 ^ 6710 

ATG GTA CTT AGG CAA CW3 TAC CAA GGC CTT CTG AGC CAA 03A GAA ACT 
Met Val Leu Arg Gin Gin Tyr Gin Gly Leu Leu Ser Gin Gly Glu Thr> 

6720 6730 ^ 6740 ^ 6750 _ 6760 ^ 6770 

GAC OK TAGCCTTC CCAGTTCTAA GATTAGAACT ATTAACAAGA CAAGAAGTGG 
Asp Leu> 

6780 6790 6800 ^ 6810 ^ 6820 ^ 6830 

OGAATGAAAG GATGAAAATG CAACCTAACC CTCCCAGAAC CCAGGAAGTT AATAAAAAGC 
6840 6850 6860 5870 ^ 6880 ^ 68S0 

TCTAAATGCC CCCGAATTCC AGACCCTGCT CGCTOCCAGT AAATAGGTAG AAGGTCACAC 
6910 ^ 6,20 _ 6930 _ 6940 ^ 6950 

•rTCCTATTGT TCCAGGGCCr GCTATCCTOG CCTAAGTAAG ATAACAGGAA ATGAGTTGAC 
6960 6970 6980 ^ 6990 ^ 7000 ^ 7010 

TAATCCCTTA TCTGGATTCT GTAAAACTGA CTC3CACCAT AGAAGAATTG ATTACACATT 
7 0 20 7030 ^ 7040 ^ 7050 ^ 7060 ^ 7070 

GAC^CCCTA GTGACCTATC TCAACTGCAA TCTGTCACTC TGCCCAGGAG CCCACGCAGA 
7080 ^7090 _7100 ^7110 ^7120 ^7130 

TCCGGACCTC CGGAGCTATT TTAAAATGAT TGGTCCACGG AGCGCGGGCT CTCGATATTT 
7140 7150 7160 7170 7180 7190 

TAAaI'IGATT GGTCCATGGA C^GCGGGCTC TCCATATTTT AAAATGATTG GTTTGTGACG 
7200 7210 _ 7220 ^ 7230 _ 7240 > 72S0 

CACAGCCTTT GTTGTGAACC CCATAAAAGC TGTCCCGATT CCGCACTCGG CXXXGCACTC 
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7260 7270 7280 7290 7300 7310 (SEQ ID NO: 2) 

cont'd 

CTCTACCCCT GCUIt3GTIX3TA CGACTGT3QG CCCCAGCGCG CTTGGAATAA AAATCCTCTT 

7320 7330 
GCTT7TTTGCA TCAAAAAAAA AAA 
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10 


20 


30 


40 


50 


60 

* * 


GCGTGGTGTA 


CGACTGTGOG 


CCCCAGCGCG 


CTTGGAATAA 


AAATCCTCTT 


GCIUT1TGCA 


70 


30 


90 


100 


110 


120 


TCAAGACCGC 


TTCTCGTGAG 


TGATTAAGGG 


GAGTQGCCTT 


TICCGAGCCT 


axAGG-iTerr 


130 


140 


150 


160 


170 


130 


TTTCCTGGTC 


TTAO\TTTGG 


GGOCTCGTCC 


GGGATCTGTC 


GCGGCCACCC 


CTAACACCCG 


190 


200 


210 


220 


230 


240 


AGAACCGACT 


'It-yGAGGTAAA 


AAGGATCCTC 


TTTTTAACGT GTATGCATGT 


ACCGGCOGGC 


250 
* •* 


260 
* * 


270 


280 


290 
* * 


300 
* * 


GTCTCTGTTC 


TGAGTGTCTG 


TTTTCAGrGG 


TGCGOGCTTT 


CGGTTTGCAG 


CTGTCCTCTXT 


310 


320 


330 


340 


350 


36C 


AGGCCGTAAG 


GGCTGGGGGA. 


CTGTGATCAG 


CAGACGTGCT 


AGGAOGATCA 


CAGGCTGCTG 


370 


380 


390 


400 


410 


420 


* * 

CCCTX3GGGGA 


CGCCCCGGGA 




GCCAGGGACG 


CCTGGTGCTC 


TCCTACTGTC 


430 


440 


450 


460 


470 


480 
* * 



(SEQ ID NO: 3) 



GGTCAGAGGA C03AATTCTG TTGCTTGAAGC GAAAGCTTCC CCCTCCGCGA 



490 500 510 520 530 540 

CTTTTGCCTG CTTCTGGAAG ACGTGGACGG GTCACGTGTG TCTGGATCTG TTGGTTTCTG 

550 560 570 580 "590 

TTTTGTGTGT CTTTGTCTTG TGTGTCCTTG TCTACAGTTT TAAT ATG GGA CAG ACG 

Met Gly Gin Thr> 

600 610 620 630 640 

GTG ACG ACC CCT CTT AGT TTG ACT CTC GAC CAT TGG ACT GAA GTT AAA 
Val Thr Thr Pro Leu Ser Leu Thr Leu Asp His Trp Thr Glu Val Lys> 

550 660 670 680 690 

TCC AGG GOT CAT AAT TTG TCA GTT CAG GTT AAG AAG GGA CCT TGG CAG 
Ser Arg Ala His Asn Leu Ser Val Gin Val Lys Lys Gly Pro Trp Gln> 

700 710 "720 730 740 

* * * * * • * * * 

ACT TTC TGT GTC TCT GAA TGG CCG ACA TIC GAT GTT GGA TGG CCA TCA 
Thr Phe Cys Val Ser Glu Trp Pro Thr Phe Asp Val Gly Trp Pro Ser> 
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750 760 770 780 (SEQ ID NO: 3) 

cont c 

GAG GGG ACC TTT AAT TCT GAG ATT ATC CTG GCT GTT AAA GCA GTT ATT 
G)u Gly Ttir Phe Asn Ser Glu He He Leu Ala Val Lys Ala Val He> 



790 



800 810 820 830 



TTT CAG ACT GGA CCC GGC TCT CAT CCC GAT CAG GAG CCC TAT ATC CTT 
Phe Gin Thr Gly Pro Gly Ser His Pro Asp Gin Glu Pro Tyr He Leu> 



840 



850 860 870 880 



* * * - 

ACG TGG CAA GAT TTG GCA GAG GAT CCT CCG CCA TGG GTT AAA CCA TGG 
Thr Trp Gin Asp Leu ALa Glu Asp Pro Pro Pro Trp Val Lys Pro Trp> 



890 900 



910 920 930 



-» •» * » 

CTG AAT AAG CCA AGA AAG CCA GCT CCC CCA ATT CTG COT CTT GGA GAG 
Leu Asn Lys Pro Axg Lys Pro Gly Pro Arg lie Leu Ala Leu Gly Glu> 

Q 40 950 S60 970 980 

AAA AAC AAA CAC TCG GCT GAA A&A GTC AAG CCC TCT CCT CAT ATC TAC 
Lys Asn Lys His Ser Ala Glu Lys Val Lys Pro Ser Pro His lie Tyr> 

990 1000 1010 1020 

CCC GAG ATT GAG GAG CCA CCG GCT TGG CCG GAA CCC CAA TCT GTT CCC 
Pro Glu He Glu Glu Pro Pro Ala Trp Pro Glu Pro Gin Ser Val Pro> 

1030 1040 1050 1060 1070 

CCA CCC CCT TAT CTG GCA CAG QGT GCC GCG AGG GGA CCC TTT GCC CCT 
pro Pro Pro Tyr Leu Ala Gin Gly Ala Ala Arg Gly Pro Phe Ala Pro> 



1080 



1090 HOO 1H0 1120 



CCT GGA GCT CCG GCG GTG GAG QGA CCT GCT GCA GGG ACT CGG AGC CCG 
Pro Gly Ala Pro Ala Val Glu Gly Pro Ala Ala Gly Thr Arg Ser Arg> 



1130 



1140 H50 H60 1170 



AGG GGC GCC ACC CCG GAG CGG ACA GAC GAG ATC GCG ACA TTA CCG CTG 
Arg Gly Ala Thr Pro Glu Arg Thr Asp Glu He Ala Thr Leu Pro Leu> 

1180 H90 1200 1210 1220 

CGC ACG TAC GGC CCT CCC ACA CCG COG GGC CAA TIG CAG CCC CTC CAG 
Arg Thr Tyr Gly Pro Pro Thr Pro Gly Gly Gin Leu Gin Pro Leu Gln> 

1230 1240 1250 1260 

-AT TGG CCC TTT TCT TCT GCA GAT CTC TAT AAT TGG AAA ACT AAC CAT 
Tyr Trp Pro Phe Ser Ser Ala Asp Leu Tyr Asn Trp Lys Thr Asn His> 

12 70 1280 1290 1300 1310 

CCC CCT TTC TCG GAG GAT CCC CAA CGC CTC ACG GOG TTG GTG GAG TCC 
Pro Pro Phe Ser Glu Asp Pro Gin Arg Leu Thr Gly Leu Val Glu Ser> 
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1320 133 o 1340 1350 ^ 1360 {S EQ ID NO: 3) 

' * * cone ' d 

CTT ATG TTC TCT CAC CAG CCT ACT TOG GAT GAT TGT CAA CAG CTG CTG 
Leu Met Phe Ser His Gin Pro Thr Trp Asp Asp Cys Gin Gin L«u Leu> 

1370 1380 1390 1400 1410 

CAG ACA CTC TTC ACA ACC GAG GAG CGA GAG AGA ATT CTA TTA GAG GCT 
Gin Thr Leu Phe Thr Thr Glu Glu Arg Glu Arg lie Leu Leu Clu A,a> 

1420 1430 1440 1450 1460 

AGA AAA AAT GTT CCT GOG GCC GAC GGG CGA CCC ACG CGG TTG CAA AAT 
Arg Lys Asr, Val Pro Gly Ala Asp Gly Arg Pro Thr Arg Leu Gin Asn> 

( 1470 1480 1490 1500 



GAG ATT GAC ATG GGA TTT CCC TTA ACT CGC CCC GGT T0G GAC TAC AAC 
Glu He Asp Met Gly Phe Pro Leu Thr Arg Pro Gly Trp Asp Tyr Asn> 

1510 1520 1530 ^ 1540 ^ 1550 

ACG GCT GAA GGT AGG GAG AGC TTG AAA ATC TAT CGC CAG GCT CTG GTG 
Thr Ala Glu Gly Arg Glu Ser Leu Lys He Tyr Arg Gin Ala Leu Val> 

1560 1570 1580 1590 1600 

GCG GGT CTC CGG GGC GCC TCA AGA CGG CCC ACT AAT TTG CCT AAG GTA 
Ala Gly Leu Arg Gly Ala Ser Arg Arg Pro Tnr Asn Leu Ala Lys Val> 

,610 162C 1630 1640 1650 

AGA GAA GTG ATG CAG GGA CCG AAT GAA CCC CCC TCT GTT TTT CTT GAG 
Arg Glu Val Met Gin Gly Pro Asn Glu Pro Pro Ger Val Phe Leu Glu> 

1660 1670 1680 1690 -- 1700 

AGG CT~ rT TG GAA GCC TTC AGG CGG TAC ACC CCT TTT GAT CCC ACC TCA 
Arg Leu Leu Glu Ala Phe Arg Arg Tyr Tnr Pro Phe Asp Pro Thr Ser> 



1710 



1720 1730 1740 



GAG GCC CAA AAA GCC TCA GTG GCT TTG GCC TTT ATA GGA CAG TCA GCC 
Glu Ala Gin Lys Ala Ser Val Ala Leu Ala Phe He Gly Girt Ser Ala> 

1750 1760 1770 1780 1790 

TTG GAT A~T AGA AAG AAG CTT CAG AGA CTG GAA GOG TTA CAG GAG GCT 

Leu Asp He Arg Lvs Lys Leu Gin Arg Leu Glu Gly Leu Gin Glu Ala> 



1800 



1810 1820 1830 1840 



GAG TTA CGT GAT CTA GTG AAG GAG GCA GAG AAA GTA TAT TAC AAA AGG 
Glu Leu Arg Asp Leu Val Ly* Glu Ala Glu Lys Val Tyr Tyr Lys Arg> 



1R50 



I860 1870 1880 1890 



* * * w 

GAG ACA GAA GAA GAA AGG GAA CAA AGA AAA GAG AGA GAA AGA GAG GAA 
Glu Thr Glu Glu Glu Arg Glu Gin Arg Lys Glu Arg Glu Arg Glu Glu> 
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(SEQ ID NO: 3) 

1900 1910 1920 1930 1940 cont'd 

***.****** 

AGG GAG GAA AGA CGT AAT AAA CGG CAA GAG AAG AAT TTG ACT AAG ATC 
Arg Glu Glu Arg Arg Asn Lys Arg Gin Glu Lys Asn Leu Thr Lys Ile> 

1950 1960 1970 19B0 

* * * * * * * * * 

TTG GCT GCA GTG GTT GAA GGG AAA AGC AAT ACG GAA AGA GAG AGA GAT 
Leu Ala Ala Val Val Glu Gly Lys Ser Asn Thr Glu Arg Glu Arg Asp> 

1990 2000 2010 2020 2030 

TTT AGG AAA ATT AGG TCA GGC CCT AGA CAG TCA GGG AAC CTG GGC AAT 
Phe Arg Lys lie Arg Ser Gly Pro Arg Gin Ser Gly Asn Leu Gly Asn> 

2040 2050 2O60 2070 2080 

* * * ♦ * * * * * 

AGG ACC CCA CTC GAC AAG GAC CAA TGT CCA TAT TGT .AAA GAA AGA GGA 
Arg Thr Pro Leu Asp Lys Asp Gin Cys Ala Tyr Cys Lys Glu Arg Gly-> 

2090 2100 2110 2120 2130 

* it 4 ■* » w w r * « 

CAC TGG GCA AGG AAC TGC CCC AAG AAG GGA AAC AAA GGA CCA AGG ATC 
His Trp Ala Arg Asn Cys Pro Lys Lys Gly Asn Lys Gly Pro Arg Ile> 

2140 2150 2160 2170 2180 

CTA GCT CTA GAA GAA GAT AAA GAT TACG GGAGACGGGG TTCCGACCCC 
Leu Ala Leu Glu Glu Asp Lys Asp> 

2190 2200 2210 2220 2230 2240 

CTCCCCGAGC CCAGGGTAAC TTTGAAGGTG GAGGGGCAAC C°iGTTGAGTT CCTGGTTGAT 

2250 2260 2270 2280 2290 2300 

ACCGGAGCGA AACATTCAGT GCTACTACAG CCATTAGGAA AACTAAAAGA TAAAAAATCC 

2310 2320 2330 2340 2350 

*■ * * * * * * * + * 

TGGGTG ATS GGT GCC ACA GGG CAA CAA CAG TAT CCA TGG ACT ACC CGA AGA 
Met Gly Ala Thr Gly Gin Gin Gin Tyr Pro Trp Thr Thr Arg Arg> 

2360 2370 2380 2390 

**-* * * * * * * 

ACA GTT GAC TTG GGA GTG GGA CGG GTA ACC CAC TOC TTT CTG GTC ATA 
Thr Val Asp Leu Gly Val Gly Arg Val Thr His Ser Phe Leu Val Ile> 

2400 2410 2420 2430 2440 

CCT GAG TGC CCA GCA CCC CTC TTA GGT AGA GAC TTA TTG ACC AAG ATC 
Pro Glu Cys Pro Ala Pro Leu Leu Gly Arg Asp Leu Leu Thr Lys Met> 

2450 2460 2470 24 R0 2490 

* * + * * + * ■* + # 

GGA GCA CAA ATT TCT TTT GAA CAA GGG AAA CCA GAA GTG TCT GCA AAT 
Gly Ala Gin Tie Ser Phe Glu Gin Gly Lys Pro Glu Val Ser Ala Asn> 
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2500 



2510 



2520 



2530 



2540 



AAC AAA OCT ATC ACT GTG TTG ACC CTC CAA TTA GAT GAC GAA TAT CGA 
Asn Lys Pro lie Thr Val Leu Thr Leu Gin Leu Asp Asp Glu Tyr Arg> 



2550 



2560 



2570 



2580 



2590 



(SEQ ID NO: 3) 
cont 1 d 



CTA TAC TCT CCC CTA GTA AAG OCT GAT CAA AAT ATA CAA TTC TOG TTG 
r*?u Tyr Ser Pro Leu Val Lys Pro Asp Gin Asn He Gin Phe Trp Leu> 



2G00 



2610 



2620 



2630 



GAA CAG TTT CCC CAA GCC TOG OCA GAA ACC GCA GGG ATG GOT TTG GCA 
Glu Gin Phe Pro Gin Ala Trp Ala Glu Thr Ala Gly Met Gly Leu Ala> 



2640 



2650 



2660 



2670 



2680 



AAG CAA GTT CCC CCA CAA GTT ATT CAA CTG AAG GCC ACT GCC ACA CCA 
Lys Gin Val Pro Pro Gin Val lie Gin Leu Lys Ala Ser Ala Irir Pro 



269C 



27C0 



2710 



2720 



2730 



GTG TCA GTC AGA CAG TAC CCC TTG AGT AAA GAA GOT CAA GAA GGA ATT 
Val Ser Val Arg Gin Tyr Pro Leu Ser Lys Glu Ala Gin Glu Gly Ile> 



2740 



2750 



2760 



2770 



27 80 



COG COG CAT GTC CAA ;_\ TTA ATC CAA CAG GGC ATC CTA GTT OCT GTC 
Arg Pro His Val Gin Arg Leu He Gin Gin Gly He Leu Val Pro Val> 



2790 



2800 



2810 



2820 



2830 



CAA TCT CCC TGG AAT ACT CCC CTG CTA COG GTT AGA AAG CCT GGG ACT 
Gin Ser Pro Trp Asn Thr Pro Leu Leu Pro Val Arg Lys Pro Gly Thr> 



2840 



2850 



2860 



2870 



AAT GAC TAT GGA CCA GTA CAG GAC TTG ACA GAG GTC AAT AAA GGG GIG 
Asn Asp Tyr Arg Pro Val Gin Asp Leu Arg Glu Val Asn Lys Arg Val> 



2880 



2890 



2900 



2910 



2920 



CAG GAT ATA CAC CCA ACA GTC COG AAC CCT TAT AAC CTC TTG TGT OCT 
Gin Asp lie His Pro Thr Val Pro Asn Pro Tyr Asn Leu Leu Cys Ala> 



2930 



2940 



2950 



2960 



2970 



CTC CCA CCC CAA CGG AGO TGG TAT ACA GTA TTG GAC TTA AAG GAT GCC 
Leu Pro Pro Gin Arg Ser Trp Tyr Thr Val Leu Asp Leu Lys Asp Ala> 



2980 



2990 



3000 



3010 



3020 



TTC TTC TGG CTG AGA TTA CAC CCC ACT AGO CAA CCA CTT TTT GCC TTC 
Phe Phe Cys Leu Arg Leu His Pro Thr Ser Gin Pro Leu Phe Ala Phe> 



3030 



3040 



3050 



3060 



3070 



GAA TGG AGA GAT CCA GGT ACG GGA AGA ACC GGG CAG CTC ACC TGG ACC 
Glu Trp Arg Asp Pro Gly Thr Gly Arg Thr Gly Gin Leu Thr Trp Tnr> 
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3080 3090 3ioo 3110 (SEQ ID NO: 3) 

***** cont ' d 

CGA CTG CCC CAA QQG TTC AAG AAC TCC CCG ACC ATC TTT GAC GAA GCC 
Arg Leu Pro Gin Gly Phe Lys Asn Ser Pro Thr lie Phe Asp Glu Ala> 

3120 3130 3140 3150 3160 

***** .,+ *■ * 

CTA CAC AGA GAC CTG GCC AAC TTC AGG ATC CAA CAC CCT CAG GTG ACC 
Leu His Arg Asp Leu Ala Asn Phe Arg lie Gin His Pro Gin Val Thr> 

3170 3180 3190 3200 3210 

<**-** **** * * 

CTC CTC CAG TAG GTG GAT GAC CTG CTT CTG GOG GGA GCC ACC AAA CAG 
Leu Leu Gin Tyr Val Asp Asp Leu Leu Leu Ala Gly Ala Thr Lys Gln> 

3220 3230 3240 3250 3260 

* ** * » * * * * 

GAC TGC TTA GAA GGC ACG AAG GCA CTA CTG CTG GAA TTG TCT GAC CTA 
Asp Cys l^eu Glu Gly Thr Lys Ala Leu Leu Leu Glu Leu Ser Asp Leu> 

3270 3280 3290 330C 3310 

GGC TAC AGA GCC TCT GCT AAG AAG GCC CAG ATT TGC AGG AGA GAG GTA 
Gly Tyr Arg Ala Ser Ala Lys Lys Ala Gin lie Cys Arg Arg Glu Val> 

3320 3330 3340 3350 

ACA TAC TTG GGG TAC ACT TTG CGG GAC GGG CAG CGA TOG CTG ACG GAG 
Thr Tyr Leu Gly Tyr Ser Leu Arg Asp Gly Gin Arg Trp Leu TTur Glu> 

3360 3370 3380 3390 3400 

GCA CGG AAG AAA ACT GTA CTC CAG ATA CCG GCC CCA ACC ACA GCC AAA 
Ala Arg Lys Lys Thr Val Val Gin lie Pro Ala Pro Thr Thr Ala Lys> 

3 410 3420 3 43 0 344 0 3450 

* * * » * * * » * * 

CAA ATG AGA GAG TTT TTG GGG ACA GCT GGA TTT TGC AGA CTG TGG ATC 
Gin Met Arg Glu Phe Leu Gly Thr Ala Gly Phe Cys Arg Leu Trp Ile> 

3460 3470 3480 3490 3500 

* * * -■ *•* * * » 

CCG GGG TTT GOG ACC TTA GCA GCC CCA CTC TAC CCG CTA ACC AAA GAA 
Pro Gly Phe Ala Thr Leu Ala Ala Pro Leu Tyr Pro Leu Thr Lys Glu> 

3510 3520 3530 3540 3550 

AAA GGG GAA TTC TCC TGG GCT CCT GAG CAC CAG AAG GCA TTT GAT GCT 
Lys Gly Glu Phe Ser Trp Ala Pro Glu His Gin Lys Ala Phe Asp Ala> 

3560 3570 3580 3590 

* * * »** » ^ * 

ATC AAA AAG GCC CTG CTG AGC GCA CCT GCT CTG GCC CTC CCT GAC GTA 
lie Lys Lys Ala Leu Leu Ser Ala Pro Ala Leu Ala Leu Pro Asp Val> 

3600 3610 3620 3630 3640 

ACT AAA CCC TIT ACC CTT TAT GTG GAT GAG CGT AAG GGA GTA GCC CGG 
Thr Lys Pro Phe Thr Leu Tyr Val Asp Glu Arg Lys Gly Val Ala Arg> 
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(SEQ ID NO: 3) 

3650 3660 3670 3680 3690 ^ cont'd 

* ******** 

GGA GTT TTA ACC CAA ACC CTA OGA CCA TGG AGA AGA CCT GTC GCC TAC 
Gly Val Leu Thr Gin Thr Leu Gly Pro Trp Arg Arg Pro Val Ala Tyr> 

3700 3710 3720 3730 3740 

* 

CTG TCA AAG AAG CTC GAT CCT GTA GCC AGT GGT TGG CCC ATA TGC CTG 
Leu Ser Lys Lys Leu Asp Pro Val Ala Ser Gly Trp Pro He Cys Leu> 

3750 3760 3770 3780 3790 

* » * * * * ** * 
AAG GCT ATC GCA GCT GTG GCC ATA CTG GTC AAG GAC GCT GAC AAA TTG 
Lys Ala He Ala Ala Val Ala He Leu Val Lys Asp Ala Asp Lys Leu> 

3800 3810 3820 3830 

********* 

ACT TTG GGA CAG AAT ATA ACT GTA ATA GCC CCC CAT GCA TTG GAG AAC 
Thr Leu Gly Gin Asn He Thr Val He Ala Pro His Ala Leu Glu Asn> 

3840 3850 3860 3870 3880 

ATC GTT CGG CAG CCC CCA GAC CGA TGG ATG ACC AAC GCC CGC ATG ACC 
He Val Arg Gin Pro Pro Asp Arg Trp Met Thr Asn Ala Arg Met Thr> 

3890 3900 3910 3920 3930 

***** ***** 

CAC TAT CAA AGC CTG CTT CTC ACA GAG AGG GTC ACG TTC GCT CCA CCA 
His Tyr Gin Ser Leu Leu Leu Thr Glu Arg Val Thr Phe Ala Pro Pro> 

3940 3950 3960 3970 3980 

* 

GCC GCT CTC AAC CCT GCC ACT CTT CTG CCT GAA GAG ACT GAT GAA CCA 
Ala Ala Leu Asn Pro Ala Thr Leu Leu Pro Glu Glu Thr Asp Glu Pro 

3990 4000 4010 4020 4030 

GTG ACT CAT GAT TGC CAT CAA CTA TTG ATT GAG GAG ACT GGG GTC CGC 
Val Thr His Asp Cys His Gin Leu Leu He Glu Glu Thr Gly Val Arg> 

4040 4050 4060 4070 

* 

AAG GAC CTT ACA GAC ATA CCG CTG ACT GGA GAA GTG CTA ACC TGG TTC 
Lys Asp Leu Thr Asp He Pro Leu Thr Gly Glu Val Leu Thr Trp Phe> 

4080 4090 4100 4110 4120 

******** 

ACT GAC GGA AGC AGC TAT GTG GTG GAA GGT AAG AGG ATG GCT GGG GCG 
Thr Asp Gly Ser Ser Tyr Val Val Glu Gly Lys Arg Met Ala Gly Ala> 

4130 4140 4150 4160 4170 

****** * 

GCG GTG GTG GAC GGG ACC CGC ACG ATC TGG GCC AGC AGC CTG CCG GAA 
Ala Val Val Asp Gly Thr Arg Thr He Trp Ala Ser Ser Leu Pro Glu> 

4180 4190 4200 4210 4220 

GGA ACT TCA GCA CAA AAG GCT GAG CTC ATG GCC CTC ACG CAA GCT TTG 
Gly Thr Ser Ala Gin Lys Ala Glu Leu Mer. Ala Leu Thr Gin Ala Leu> 
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,o<;n 4270 (SEQ ID NO: 

4230 ^ 4240 ^ 4250 ^ 4260 ^ 4270 ^ , ^ 

CGG CTG GCC GAA GGG AAA TCC ATA AAC ATT TAT ACG GAC AGC AGG TAT 
Arg Leu Ala GLu Gly Lys Ser lie Asn He Tyr Thr Asp Ser Arg Tyr> 



4280 



4290 4300 4310 



* * * w 

GCC TTT GCG ACT GCA CAC GTA CAT GOG GCC ATC TAT AAA CAA AGG GGG 
Ala Phe Ala Thr Ala His Val His Gly Ala He Tyr Lys GLn Arg Gly> 

4320 4330 4340 4350 4360 

TTG CTT ACC TCA GCA GGG AGG GAA ATA AAG AAC AAA GAG GAA ATT CTA 
Leu Leu Tnr Ser Ala Gly Arg Glu lie Lys Asn Lys Glu Glu He L^u> 



4370 



4380 4390 4400 4410 



lis> 



AGC CTA TTA GAA GCC GTA CAT TTA CCA AAA AGG CTA GCT ATT ATA CAC 
Ser Leu Leu Glu Ala Val His Leu Pro Lys Arg Leu Ala He He H'~ 

44^0 4430 4440 4450 4460 

~\ . * * * * * * 

TGT CCT GGA CAT CAG AAA GCT AAA GAT CTC ATA TCC AGA C<iA AAC CAG 
Cys Pro Gly His Gin Lys Ala Lys Asp heu He Ser Arg Gly Asn Gln> 

4480 4490 4500 4510 



4470 



ATG GCT GAC CGG GTT GCC AAG CAG GCA GCC C"C CCT GTT AAC CTT CTG 
Met Ala Asp Arg Val Ala Lys Gin Ala Ala Gin Gly Val Asn Leu Leu> 

4520 4530 4540 4550 

CCT ATA ATA GAA ATG CCC AAA GCC CCA GAA CCC AGA CCA CAG TAC ACC 
Pro He He Glu Met Pro Lys Ala Pro Glu Pro Arg Arg Gin Tyr Thr> 

4560 4570 4580 4590 4600 

CTA GAA GAC TGG CAA GAG ATA AAA AAG ATA GAC CAG TTC TCT GAG ACT 
Leu Glu Asp Trp Gin Glu He Lys Lys He Asp Gin Phe Ser Glu Tnr> 

4610 4620 4630 4640 4650 

********** 

CCG GAA GGG ACC TGC TAT ACC TCA GAT GGG AAG GAA ATC CTG CCC CAC 
Pro Glu Gly Thr Cys Tyr Thr Ser Asp GLy Lys Glu He Leu Pro His> 

4660 4670 4680 4690 4700 

AAA GAA GGG TTA GAA TAT GTC CAA CAG ATA CAT CGT CTA ACC CAC CTA 
Lys Glu Gly Leu Glu Tyr Val Gin Gin He His Arg Leu Thr His Leu> 

4710 4720 4730 4740 4750 

GGA ACT AAA CAC CTG CAG CAG TTG GTC AGA ACA TCC CCT TAT CAT GIT 
Gly Tor Lys His Leu Gin Gin Leu Val Arg Thr Ser Pro Tyr His Val> 

4760 4770 4730 4790 

********* 

CTG AGG CTA CCA GGA GTG GCT GAC TCC CTG GTC AAA CAT TCT GTG CCC 
Leu Arg Leu Pro Gly Val Ala Asp Ser Val Val Lys His Cys Val Pro 
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4800 4810 4820 4830 4840 (SEQ ID NO: 3) 

* cont q 

TGC CAG CTG GTT AAT CCT AAT CCT TCC AGA ATG CCT CCA GGG AAG AGA 
Cys Gin Leu Val Asn Ala Asn Pro Ser Axg Met Pro Pro Gly Lys Arg> 

4850 4860 4870 4880 4890 

CTA AGG GGA AGC CAC CCA GGC GCT CAC TGG GAA GTG GAC TTC ACT GAG 
Leu Arg Gly Ser His Pro Gly Ala His Trp Glu Val Asp Phc Thr Glu> 

4900 4910 4920 4930 4940 

******* 
GTA AAG CCG GCT AAA TAC GGA AAC AAA TAC CTA TTG GTT TTT GTA GAC 
Val Lys Pro Ala Lys Tyr Gly Asn Lys Tyr Leu Leu Val Phe Val Asp> 

4950 4960 4970 4980 4990 

* * * * * -* * * 

ACC TTT TCA GGA TOG GTA GAG GCT TAT CCT ACT AAG AAA GAG ACT TCA 
Thr Phe Ser Gly Trp Val Glu Ala Tyr Pro Thr Lys Lys Glu Thr Ser> 

5000 5010 5020 5030 

*»* - *** * 

ACC GTG GTG GCT AAA AAA ATA CTG GAA GAA ATT TTT CCA AGA TTT GGA 
Thr Veil Val Ala Lys Lys He Leu Glu Glu He Phe Pro Arg Phe Gly> 

5040 5050 5060 5070 5080 

* * T ' * 

ATA CCT AAC CT:. _\TA GGG TCA GAC AAT GGT CCA GCT TTT GTT GCC CAG 
lie Pro Lys Val He Gly Ser Asp Asn Gly Pro Ala Phe Val Ala Gln> 

5090 5100 5110 5120 5130 

GTA AGT CAG GGA CTG GCC AAG ATA TTG GGG ATT GAT TGG AAA CTG CAT 
Val Ser Gin Gly Leu Ala Lys He Leu Gly He A-rp Trp Lys Leu His> 

5140 5150 5160 5170 5180 

+ ,* * * * * * 

TGT GCA TAC AGA CCC CAA AGC TCA GGA CAG GTA GAG AGG ATG AAT AGA 
Cys Ala Tyr Arg Pro Gin Ser Ser Gly Gin Val Glu Arg Met Asn Arg> 



5190 



5200 5210 5220 5230 



ACC ATT AAA GAG ACC CTT ACT AAA TTG ACC GOG GAG ACT GGC GTT AAT 
Thr He Lys Glu Thr Leu Thr Lys Leu Thr Ala Glu Thr Gly Val Asn> 

5240 5250 5260 52T0 

w * *■ * r * * 

GAT TGG ATA GCT CTC CTG CCC TTT GTG CTT TTT AGG GTT AGG AAC ACC 
Asp Trp He Ala Leu Leu Pro Phe Val Leu Phe Arg Val Arg Asn Tnr> 

5280 5290 5300 5310 5320 

******** 
CCT GGA CAG TTT GGG CTG ACC CCC TAT GAA TTA CTC TAC GGG GGA CCC 
Pro Gly Gin Phe Gly Leu Thr Pro Tyr Glu Leu Leu Tyr Gly Gly Pro> 

5330 5340 5350 5360 5370 

CCC CCA TTG GTA GAA ATT GCT TCP GTA CAT AGT GCT GAC GTG CTG CTT 
Pro Pro Leu Val Glu lie Ala Ser Val His Ser Ma Asp Val Leu Leu"> 
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5380 5390 5400 5410 5420 (SEQ ID NO: 3) 

cont'd 

TCC CAG CCT TTG TTC TCT AGG CTC AAG OCA CTT GAG TGG GTG AGA CAA 
Ser Gin Pro Leu Phe Ser Arg Leu Lys Ala Leu Glu Trp Val Arg Gln> 

5430 544G 5450 5460 5470 

***** 

CGA GCG TGG AGG CAA CTC CGG GAG GCC TAC TCA GGA GGA GGA GAC TTG 
Arg Ala Trp Arg Gin Leu Arg Glu Ala Tyr Ser Gly Gly Gly Asp Leu> 

5480 5490 5500 5510 

*»*-★** * * 

CAG ATC CCA CAT CGT TTC CAA GTG GGA GAT TCA GTC TAC GTT AGA CGC 
Gin lie Pro His Arg Phe Gin Val Gly Asp Ser Val Tyr Val Arg Arg> 

5520 5530 5540 5550 5560 

** * » * * ** * * 

CAC CGT GCA GGA AAC CTC GAG ACT CGG TGG AAG GGC CCT TAT CTC GTA 
His Arg Ala Gly Asn Leu Glu Thr Arg Trp Lys Gly Pro Tyr Leu Val> 

5570 5580 5590 5600 5610 

* ♦ * * * * * 

CTT TTG ACC ACA CCA ACG GCT GTG AAA GTC GAA GGA ATC TCC ACC TOG 
Leu Leu Thr Thr Pro Thr Ala Val Lys Val Glu Gly He Ser Thr Trp> 

5620 5630 5640 5650 5660 

ATC CAT GCA TCC CAC GTT AAA COG GCG CCA CCT CCC GAT TOG GGG TGG 

Met His Pro Tnr Leu Asn Arg Arg His Leu Pro lie Arg Gly Gly> 

He His Ala Ser His Val Lys Pro Ala Pro Pro Pro Asp Ser Gly Trp> 

5670 5680 5690 5700 571C 

AAA GCC GAA AAG ACT GAA AAT CCC CTT AAG CTT CCC CTC CAT CGC GTG 
Lys Pro Lys Arg Leu Lys He Pro Leu Ser Phe Ala Ser lie Ala Trp> 

Lys Ala Glu Lys Thr Glu Asn Pro Leu Lys Leu Arg Leu His Arg Va 1 > 

5720 5730 5740 5750 5760 

„ - * * * * 

GTT CCT TAC TCT GTC AAT AAC CTC TCA GAC T AAT GOT ATG CGC ATA GGA 
Phe Leu Thr Leu Ser He Thr Ser Gin Thr Asn Gly Met Arg He Gly> 

Val Pro Tyr Ser Val Asn Asn Leu Ser Asp> 

5770 5780 5790 5800 

* * » * * * * * * 

GAC AGC CTG AAC TCC CAT AAA CCC TTA TCT CTC ACC TGG TTA 7vTT ACT 
Asp Ser Leu Asn Ser His Lys Pro Leu Ser Leu Thr Trp Leu He Thrt> 

5810 5820 5830 5840 5850 

GAC TCC GGC ACA GGT ATT AAT ATC AAC AAC ACT CAA GGG GAG GCT CCT 
Asp Ser Gly Thr Gly He Asn He Asn Asn Thr Cln Gly Glu Ala Pro> 
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5860 5870 5880 5890 5900 (SEQ ID NO: 3) 

™ cont'd 

TTA GGA ACC TGG TGG CCT GAT CTA TAC GTT TGC CTC AGA TCA CTT ATT 
Leu Gly Thr Trp Trp Pro A^p Leu Tyr Val Cys Leu Arg Ser Val Ue> 

5910 5920 5930 5940 5950 

* 

CCT ACT CTG ACC TCA CCC CCA GAT ATC CTC CAT GCT CAC GGA TIT TAT 
Pro Ser Leu Thr Ser Pro Pro Asp He Leu His Ala His Gly Phe Tyr> 



5960 



5970 5980 5990 6000 



GTT TGC CCA GGA CCA CCA AAT AAT GGA AAA CAT TGC GGA AAT CCC AGA 
Val Cys Pro Gly Pro Pro Asn Asn Gly Lys His Cys Gly Asn Pro Arg> 

6010 6020 6030 6040 

******* 

GAT TTC TTT TGT AAA CAA TGG AAC TGT CTA ACC TCT AAT GAT GGA TAT 
Asp Phe Phe Cys Lys Gin Trp Asn Cys Val Thr Ser Asn Asp Gly Tyr> 

6050 6060 6070 6080 6090 

***** 
TGG AAA ICG CCA ACC TCT CAG CAG GAT AGG GTA ACT TTT TCT TAT GTC 
Trp Lys Trp Pro Thr Ser Gin Gin Asp Arg Val Ser Phe Ser Tyr Val> 

6100 6110 6120 6130 6140 

AAC ACC TAT ACC AGC TCT GGA CAA TTT AAT TAG CTG ACC TGG ATT AGA 
Asn Thr Tyr Thr Ser Ser Gly Gin Phe Asn Tyr Leu Thr Trp He Arg> 

6150 6160 6170 6180 6190 

ACT GGA AGC CCC AAG TGC TCT CCT TCA GAC CTA GAT TAC CTA AAA ATA 
Thr Gly Ser Pro Lys Cys Ser Pro Ser Asp Leu Asp Tyr Leu Lys Ile> 

6200 6210 6220 6230 6240 

*— - * 

ACT TTC ACT GAG AAA GGA AAA CAA GAA AAT ATC CTA AAA TGG GTA AAT 
Ser Phe Thr Glu Lys Gly Lys Gin Glu Asn He Leu Lys Trp Val Asn> 

6250 6260 6270 6280 

GGT ATG TCT TGG GGA ATG GTA TAT TAT GGA GGC TCG GCT AAA CAA CCA 
Gly Met Ser Trp Gly Met Val Tyr Tyr Gly Gly Ser Gly Lys Gin Pro> 

6290 6300 6310 6320 6330 

******** 
GGC TCC ATT CTA ACT ATT CGC CTC AAA ATA AAC CAG CTG GAG CCT CCA 
Gly Ser He Leu Thr He Arg Leu Lys He Asn Gin Leu Glu Pro Pro> 

6340 6350 6360 6370 6380 

* 

ATG GCT ATA GGA CCA AAT ACG GTC TTG ACG GGT CAA AGA CCC CCA ACC 
Met Ala He Gly Pro Asn Thr Val Leu Thr Ciy Gin Arg Pro Pro Thr> 

6390 6400 6410 6420 6430 

******** 

C^A GGA CCA GGA CCA TCC TCT AAC ATA ACT TCT GGA TCA GAC CCC ACT 
Gin Gly Pro Gly Pro Ser Ser Asn He Thr Ser Gly Ser Asp Pro Thir> 
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64 , c 5450 6460 6470 6480 (SEQ ID NO: 3) 

********** cont'd 

GAG TCT AAC AGC ACG ACT AAA ATG GGG GCA AAA CTT TTT AGC CTC ATC 
Glu Ser Asn Ser Thr Thr Lys Met Gly Ala Lys Leu Phe Ser Leu Ile> 



6490 



6500 651C 6520 



CAG GGA OCT TTT CAA GCT CTT AAC TCC ACG ACT CCA GAG GCT ACC TCT 
Gin Gly Ala Phe Gin Ala Leu Asn Ser Thr Thr Pro Glu Ala Thr Ser> 



6530 



6S40 6550 6560 6570 



TT TGT TGG CTA TGC TTA GCT TCG GGC CCA CCT TAG TAT GAA GGA ATG 
Ser Cys Trp Leu Cys Leu Ala Ser Gly Pro Pro Tyr Tyr Glu Gly Met> 

6580 6590 5600 6610 6620 

4 

GCT AGA AGA GGG AAA TTC AAT GTG ACA AAA GAA CAT AGA GAC CAA TGC 
Ala Arg Arg Gly Lys Phe Asn Val Thr Lys Glu His Arg Asp Gin Cys> 

6630 6640 6650 ^ 6660 ^ 6670 

ACA TGG GGA TCC CAA AAT AAG CTT ACC CTT ACT GAG GTT TCT GGA AAA 
Thr Trp Gly Ser Gin Asn Lys Leu Thr Leu Thr Glu Val Ser Gly Lys> 

6680 6690 6700 6710 6720 



GGC ACC TGC ATA GGA AAG GTT CCC CC 



CC CAC CAA CAC CIT TGT AAC 



Gly Thr cys He Gly Lys Val Pro Pro Ser His Gin His Leu Cys Asn> 
6730 6740 675C 6760 

* 

CAC ACT GAA GCC TTT AAT CAA ACC TCT GAG ACT CAA TAT CTG GTA CCT 
His Thr Glu Ala Phe Asn Gin Thr Ser Glu Ser Gin Tyr Leu Val Pro 

G770 6730 6790 6800 ^ 6610 

GGT TAT GAC AGO TGG TGG GCA TGT AAT ACT GGA. TTA ACC CCT TCT GTT 
Gly Tyr Asp .Arg Trp Trp Ala Cys Asn Tnr Gly Leu Thr Pro Cys Vai> 



6820 



6830 6840 6850 6860 



TCC ACC TTG GTT TTT AAC CAA ACT AAA GAT TTT TGC ATT ATG GTC CAA 
Ser Thr Leu Val Phe Asn Gin Thr Lys Asp Phe Cys lie Met Val Gln> 

6870 6880 6890 6900 6910 

ATT GTT CCC CGA GTG TAT TAC TAT CCC GAA AAA CCA ATC CIT GAT GAA 
He Val Pro Arg Val Tyr Tyr Tyr Pro Glu Lys Ala lie Leu Asp Glu> 

6920 6930 6940 6950 6960 

* * * * * * 

* * * * 

TAT GAC TAC AGA AAT CAT CGA CAA AAG AGA GAA CCC ATA TCT 1 CTG ACA 
Tyr AS P Tyr Arg Asa His Arg Gin Lys Arg Glu Pro lie Scr Leu Thr> 

6970 6980 6990 7000 

„ > * * * " * 

CTT GCT GTG ATC CTC GGA CTT GGA GTG GCA GCA GGT GTA GCA ACA GGA 
Leu Ala Val Met Leu Gly Leu Gly Val /da Ala Gly Val Gly Thr Gly> 
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7010 7020 7030 7040 7050 (SEQ ID NO: 3) 

cont'd 

ACA OCT GCC CTG GTC ACG GGA CCA CAG CAG CTA GAA ACA GGA CTT AGT 
Thr Ala Ala Leu Val Thr Gly Pro Gin Gin Leu Glu Thr Gly I,eu Ser> 

7 060 7070 7080 7090 7100 

* * * * * * * * * 

AAC CTA CAT CGA ATT GTA ACA GAA GAT CTC CAA GCC CTA GAA AAA TCT 
Asn Leu His Arg He Val Thr Glu Asp Leu Gin Ala Leu Glu Lys Ser> 

^110 7120 7130 7140 7150 

* ^ + * * * » * + 

GTC AGT AAC CTG GAG GAA TCC CTA ACC TCC TTA TCT GAA GTA GTC CTA 
Val Ser Asn Leu Glu Glu Ser Leu Thr Ser Leu Ser Glu Val Val Leu> 

7160 7170 7180 7190 7200 

* ********* 

CAG AAT AGA AGA GOG TTA GAT TTA TTA TTT CTA AAA GAA GGA GGA TTA 
Gin Asn Arg Arg Gly Leu Asp Leu Leu Phe Leu Lys Glu Gly Gly Leu> 

7210 7220 7230 7240 

* * • * + * + 

TCT GTA GCC TiG AAG GAG GAA TGC TGT TTT TAT GTC GAT CAT TCA GGG 
Cys Val .-via Leu Lys Glu Glu Cys Cys Phe Tyr Vai Asp His Ser Gly> 

7250 7260 7270 7280 7290 

* * * * * ■* * * * 

GCC A7C . GAC TCC ATG AAC AAG CTT AG*\ GAA AGG TTG GAG AAG CGT 
Ala lie Arg Asp Ser Mec Asn Lys Leu Arg Glu Arg Leu Glu Lys Arg> 

7300 7310 7320 7330 7340 

* * * * * * * * „ 

CGA AGG GAA AAG GAA ACT ACT CAA GGG TGC TTT GAG GGA TOG TTC AAC 
Arg Arg Glu Lys Glu Thr 1-hr Gin Gly Trp Phe Glu Gly Trp Phe Asn> 

7350 7360 7370 7380 7390 

* ***** 

AGG TCT CTT TGC TTG GCT ACC CTA CTT TCT GCT TFA ACA GGA CCC TTA 
Arg Ser Leu T*rp Leu Ala Thr Leu Leu Ser Ala Leu Thr Gly Pro Leu> 

7400 7410 7420 7430 7440 

- 

ATA GTC CTC CTC CTG TTA CTC ACA GTT GGG CCA TGT ATT ATT AAC AAG 
lie Val Leu Leu Leu Leu Leu Thr Val Gly Pro Cys He lie Asn Lys> 

7 450 7460 7470 7480 

* * ***** 

TTA ATT GCC TTC ATT AGA GAA CGA ATA AGT GCA GTC CAG ATC ATG GTA 

!>eu lie Ala Phe He Arg Glu Arg lie Ser Ala Val Gin lie Met Val> 



749° 7500 7510 7520 



7530 



CTT AGA CAA CAC TAC C^A AGC COG TCT AGC AGG GAA GCT GCC CCC 
Leu Arg Gin Gin Tyr Gin Ser Pro Ser Ser Arg Glu Ala Gly Arg> 

7540 7550 7550 7570 7580 7590 

TAGCTCT ACC^GTTCTA AGATTAGAAC TATTAACAAG AGAAGAAGTG GGGAATGAAA 
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7600 7610 7620 7630 ^640 7650 (SEQ ID NO: 3) 

* * * * * * * * cont'd 

GGATGAAAAT ACAACCTAAG CTAATGACAA GCTTAAAATT GTTCTGAATT CCAGAGTTTG 

7660 7670 7680 7690 7700 7710 

+ * * + * * * * * •** 

TTCCTTATAG GTAAAAGATT AQ ol ' iTlTl G CTGTTTTAAA ATATGCOGAA GTAAAATAGG 

7720 7730 7740 7750 7760 7770 

* * * * -** + * * * * *■ 

CCCTGAGTAC ATGTCTCTAG GCATGAAACT TCTTGAAACT ATTTGAG^TA ACAAGAAAAG 
7780 7790 7800 7810 7820 7830 

GOAGTTTCTA ACTGCTTGTT TAGCTTCTGT AAAACTQGTT GCGCCATAAA GATGTTGAAA 

7840 7850 7860 7870 7880 7890 

* * * * * * * * * 

TGTTGATACA CATATXTTTGG TGACAACATG TXTTCCCCCAC COCGAAACAT OCGCAAATGT 
7900 7910 7920 7930 7940 7950 

GTAACTCTAA AACAATTTAA ATTAATTGGT CCACCAAGCG CGGGCTCTCC AAGTTTTAAA 
7960 7970 7980 7990 8000 8010 

T.TGACTOGTT TGTGATATTT TGAAATGATT GGTTTGrTAAA GCG003GCTT TCTTCTGAAC 

8020 8030 8040 8050 8060 8070 

************ 

CCCATAAAAG CTGTTCCCGAC TOCACACTCG GOGCCQCAGT CCICTACCCC TGCGTGGTGT 
8080 8090 8100 8110 8120 8130 

* #■ * w » * * * * * * r 

ACGACTGTGG GCCCCAGCGC GCTTGGAATA AAAATCCTCT TGCTGTTTGC ATCAAAAAAA 
AA 
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